Hi,
I have a simple dataframe with a multiindex.
stats.index.names
Out[67]: FrozenList([u'day', u'category'])
day is a time variable, category is a string.
stats.index.inferred_type
Out[69]: 'mixed'
stats.index.dtype_str
Out[68]: 'object'
if I type
idx=pd.IndexSlice
stats.loc[idx['2015-01-01',:],:]
I get the correct slice at '2015-01-01', that is all the observations for all the categories on that day.
If I type
idx=pd.IndexSlice
stats.loc[idx['2015',:],:]
I get back (almost) all my observations, even for years other than 2015.
What can be the problem here? I cannot paste the data unfortunately but I am happy to help in any way.
Thanks
Comment From: jreback
pls follow the instructions and submit the versions and a minimal example that reproduces
Comment From: randomgambit
pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None
Comment From: randomgambit
dft2 = pd.DataFrame(np.random.randn(2000, 1),
columns=['A'],
index=pd.MultiIndex.from_product([pd.date_range('20130101',
periods=1000,
freq='12H'),
['category1', 'category2']]))
idx=pd.IndexSlice
dft2.loc[idx['2013-01-03',:],:]
Out[95]:
A
2013-01-03 00:00:00 category1 0.295829
category2 0.006298
2013-01-03 12:00:00 category1 -0.389461
category2 0.124865
idx=pd.IndexSlice
dft2.loc[idx['2013',:],:]
Out[96]:
A
2013-01-01 00:00:00 category1 -0.050261
category2 1.221524
2013-01-01 12:00:00 category1 -0.133423
category2 1.029201
2013-01-02 00:00:00 category1 0.543025
category2 0.619642
2013-01-02 12:00:00 category1 1.509450
category2 -1.405432
2013-01-03 00:00:00 category1 0.295829
category2 0.006298
2013-01-03 12:00:00 category1 -0.389461
category2 0.124865
2013-01-04 00:00:00 category1 0.332380
category2 0.705887
2013-01-04 12:00:00 category1 -0.932210
category2 0.602924
2013-01-05 00:00:00 category1 0.156599
category2 -0.268288
2013-01-05 12:00:00 category1 0.491792
category2 0.281679
2013-01-06 00:00:00 category1 0.835345
category2 -0.958069
2013-01-06 12:00:00 category1 -0.880574
category2 -1.279748
2013-01-07 00:00:00 category1 -1.652601
category2 0.120114
2013-01-07 12:00:00 category1 -1.542260
category2 0.469174
2013-01-08 00:00:00 category1 0.890845
category2 0.201862
...
2014-05-01 00:00:00 category1 -0.156247
2014-05-01 12:00:00 category1 0.497289
2014-05-02 00:00:00 category1 -0.276293
2014-05-02 12:00:00 category1 0.676201
2014-05-03 00:00:00 category1 -0.014220
2014-05-03 12:00:00 category1 0.101065
2014-05-04 00:00:00 category1 0.171072
2014-05-04 12:00:00 category1 -0.166051
2014-05-05 00:00:00 category1 0.330556
2014-05-05 12:00:00 category1 0.998734
2014-05-06 00:00:00 category1 0.263579
2014-05-06 12:00:00 category1 0.092337
2014-05-07 00:00:00 category1 -1.041369
2014-05-07 12:00:00 category1 -0.425271
2014-05-08 00:00:00 category1 -1.036846
2014-05-08 12:00:00 category1 0.010377
2014-05-09 00:00:00 category1 -0.875446
2014-05-09 12:00:00 category1 0.363559
2014-05-10 00:00:00 category1 -1.586318
2014-05-10 12:00:00 category1 -1.178954
2014-05-11 00:00:00 category1 -0.024954
2014-05-11 12:00:00 category1 0.058637
2014-05-12 00:00:00 category1 1.127713
2014-05-12 12:00:00 category1 -0.259863
2014-05-13 00:00:00 category1 0.053491
2014-05-13 12:00:00 category1 0.193916
2014-05-14 00:00:00 category1 -0.574039
2014-05-14 12:00:00 category1 0.380214
2014-05-15 00:00:00 category1 -3.067484
2014-05-15 12:00:00 category1 1.158423
[1730 rows x 1 columns]
Am I misunderstanding something here? thanks
Comment From: jreback
duplicated by #12896, closed by #13117
In [8]: pd.options.display.max_rows=10
In [9]: dft2 = pd.DataFrame(np.random.randn(2000, 1),
columns=['A'],
index=pd.MultiIndex.from_product([pd.date_range('20130101',
periods=1000,
freq='12H'),
['category1', 'category2']]))
In [10]: dft2
Out[10]:
A
2013-01-01 00:00:00 category1 0.555222
category2 1.067353
2013-01-01 12:00:00 category1 -0.556181
category2 1.115773
2013-01-02 00:00:00 category1 0.290784
... ...
2014-05-14 12:00:00 category2 -0.922701
2014-05-15 00:00:00 category1 -0.733799
category2 -1.497488
2014-05-15 12:00:00 category1 -0.348072
category2 1.387935
[2000 rows x 1 columns]
In [11]: idx=pd.IndexSlice
In [12]: dft2.loc[idx['2013-01-03',:],:]
Out[12]:
A
2013-01-03 00:00:00 category1 -1.397535
category2 0.051541
2013-01-03 12:00:00 category1 0.310371
category2 -0.715179
In [13]: dft2.loc[idx['2013',:],:]
Out[13]:
A
2013-01-01 00:00:00 category1 0.555222
category2 1.067353
2013-01-01 12:00:00 category1 -0.556181
category2 1.115773
2013-01-02 00:00:00 category1 0.290784
... ...
2013-12-30 12:00:00 category2 -0.494284
2013-12-31 00:00:00 category1 -0.123571
category2 -2.059731
2013-12-31 12:00:00 category1 1.465607
category2 -0.975960
[1460 rows x 1 columns]
Comment From: randomgambit
damn I was proud of finding a cool bug, you destroyed my dreams
Comment From: jorisvandenbossche
@randomgambit But anyway, thanks for reporting!