Pandas groupby problem the grouping function is applied to the whole index column...

Code Sample, a copy-pastable example if possible

import pandas as pd
import pickle
from datetime import datetime, timedelta
import numpy as np

# function to group by day of year
def group_func(d):
    print("Inside grouping func")
    print("d={}".format(d))
    print("d.month = {}".format(d.month))
    print("d.year = {}".format(d.year))
    print(d.month, d.year)
    return datetime(2001, d.month, d.year)

dates = [datetime(2000, 1, 1) + timedelta(days=i) for i in range(10000)]
data = np.random.randn(len(dates))
df = pd.DataFrame(data=data, index=dates)

df = df.select(lambda d: not (d.month == 2 and d.day == 29))
groups = df.groupby(by=group_func)

Here I have a notebook that demonstrates the problem: https://github.com/guziy/PyNotebooks/blob/master/debug/weird_pandas_grouping.ipynb

Problem description

The problem is that when I run the code above, I get an exception because in the grouping function a datetime object is created expecting index values and not entire index column. And therefore I get this exception when trying to create datetime: TypeError: an integer is required (got type Int64Index)

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.4.3.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8 pandas: 0.20.3 pytest: None pip: 9.0.1 setuptools: 28.5.0 Cython: 0.22 numpy: 1.13.1 scipy: 0.15.1 xarray: 0.8.2 IPython: 5.1.0 sphinx: 1.4.8 patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: 2.6.2 feather: None matplotlib: 1.5.3 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

well, you have an error in your grouping function

return datetime(2001, d.month, d.year) should be return datetime(2001, d.month, d.day)

in any event you are better off doing something like this

In [32]: df.groupby([df.index.month, df.index.day]).sum()
Out[32]: 
               0
1  1    2.912116
   2   -1.814301
   3  -10.006528
   4   -9.808739
   5   -1.420640
   6    2.062087
...          ...
12 26   6.842057
   27   7.446606
   28   3.837242
   29   1.415139
   30   6.850449
   31  -7.354046

[365 rows x 1 columns]

Comment From: guziy

Thanks @jreback: Cheers

Pandas groupby problem the grouping function is applied to the whole index column...

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

Output of `pd.show_versions()`