Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2018', periods=72, freq='H')
rngIndex = pd.Index(rng, name='DateTime')
tsdf = pd.DataFrame(np.random.randn(len(rng)), index=rngIndex, columns=['x'])
tsdf.groupby(pd.Grouper(freq='1D', key='DateTime')).mean()
Problem description
The above produces the following error:
KeyError: 'The grouper name DateTime is not found'
There is a workaround, which is to do reset_index()
:
tsdf.reset_index().groupby(pd.Grouper(freq='1D', key='DateTime')).mean()
Since v 0.20.0 introduced the support of grouping by index level names, I think that Grouper
should do the same.
Expected Output
x
DateTime
2018-01-01 0.250264
2018-01-02 -0.186042
2018-01-03 0.025520
Output of pd.show_versions()
Comment From: jreback
In [2]: rng = pd.date_range('1/1/2018', periods=72, freq='H')
...: rngIndex = pd.Index(rng, name='DateTime')
...: tsdf = pd.DataFrame(np.random.randn(len(rng)), index=rngIndex, columns=['x'])
...:
...:
...: tsdf.groupby(pd.Grouper(freq='1D', level='DateTime')).mean()
...:
...:
Out[2]:
x
DateTime
2018-01-01 0.089906
2018-01-02 0.257909
2018-01-03 -0.276228
Comment From: jreback
you need to use level
Comment From: Dr-Irv
@jreback OK, using level
is a better workaround. But my point here is that the API is not consistent. When doing a regular groupby
, I can use a mix of names from the index and the columns and not have to worry about whether a name refers to the index or the column, and also not worry about which level number each index name is at. But in Grouper
, I now need to know whether the name is in the index or in the columns, and if in the index, I need to know the position within that index.
So my suggestion is an enhancement, which is to allow using index names in the key
argument of Grouper
.
Comment From: thierryzoller
+1
Comment From: adrienpacifico
There is a workaround, which is to do reset_index():
tsdf.reset_index().groupby(pd.Grouper(freq='1D', key='DateTime')).mean()
You can also use .resample
instead: tsdf.resample("1D").mean()