import pandas as pd
from itertools import cycle, islice
N = 4
df = pd.DataFrame(index=pd.date_range('2000', periods=N))
df['col1'] = list(islice(cycle(['A', 'B']), N))
df['col2'] = list(islice(cycle(['a', 'b', 'c']), N))
print(df)
returns
col1 col2
2000-01-01 A a
2000-01-02 B b
2000-01-03 A c
2000-01-04 B a
If I then do the following groupbys, lines 2 and 4 return the same result (a multiindex), but line 3 returns the result of running .unstack
on line 1.
print(df.groupby(['col1', pd.TimeGrouper('W')]).size()) # 1. unstacked
print(df.groupby(['col2', pd.TimeGrouper('W')]).size()) # 2. unstacked
print(df.groupby('col1').resample('W').size()) # 3. multiindex
print(df.groupby('col2').resample('W').size()) # 4. unstacked
col1
A 2000-01-02 1
2000-01-09 1
B 2000-01-02 1
2000-01-09 1
dtype: int64
col2
a 2000-01-02 1
2000-01-09 1
b 2000-01-02 1
c 2000-01-09 1
dtype: int64
2000-01-02 2000-01-09
col1
A 1 1
B 1 1
col2
a 2000-01-02 1
2000-01-09 1
b 2000-01-02 1
c 2000-01-09 1
dtype: int64
This seems like an inconsistency to me, in the sense that, if 1 and 3 do not return the same result, neither should 2 and 4. Is this a bug, or is df.groupby(col).resample('x')
supposed to behave like this, and is it supposed to behave differently to df.groubpy[(col, pd.TimeGrouper(x)])
?
Note that in df
col1 has two groups of identical size, while col2 has unequal sized groups. I'm not sure if that's the problem, but this inconsistency goes away for some choices of N
(e.g. all four lines above return multiindex for N=3
and N=24
?!)
print(df.groupby('col1').size())
print(df.groupby('col2').size())
col1
A 2
B 2
dtype: int64
col2
a 2
b 1
c 1
dtype: int64
Comment From: jreback
show pd.show_versions()
Comment From: mikepqr
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.3
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
Comment From: jreback
this is a manifestation of #13056
I think we will have to fix this as it IS a bit inconsistent.
but will be tracked in that issue. thanks for the report (and test cases)