I have DataFrame:
import pandas as pd
import numpy as np
rng = pd.date_range('01-01-1988',periods=130000,freq='H')
long_df = pd.DataFrame(np.random.randn(130000,4),index = rng, columns=['bar','baz','foo','zoo'])
dfmi = long_df.stack().sort_index()
print (dfmi)
1988-01-01 00:00:00 bar -1.129213
baz 1.405388
foo 0.482324
zoo -2.315226
1988-01-01 01:00:00 bar -0.533171
baz 1.906526
foo -0.745095
zoo 0.351300
1988-01-01 02:00:00 bar -0.162411
baz -0.218296
foo 0.327074
zoo -0.508463
1988-01-01 03:00:00 bar -0.676780
baz -1.137270
foo 0.389457
zoo -0.517306
1988-01-01 04:00:00 bar 1.368110
baz 1.106048
foo 1.342662
zoo -0.215235
1988-01-01 05:00:00 bar -1.284003
baz 0.216028
foo 0.889421
zoo -0.965881
1988-01-01 06:00:00 bar 1.522085
baz 0.436332
foo 0.114714
zoo 0.342456
1988-01-01 07:00:00 bar -0.176803
baz -0.365471
2002-10-30 08:00:00 foo -1.196314
zoo 1.245414
2002-10-30 09:00:00 bar 1.271792
baz -0.688911
foo -0.852729
zoo 0.869926
2002-10-30 10:00:00 bar -0.487386
baz -0.161470
foo 0.593564
zoo -0.440397
2002-10-30 11:00:00 bar 0.856882
baz -0.297341
foo 2.106393
zoo 0.186903
2002-10-30 12:00:00 bar 1.551650
baz 1.152780
foo 0.726544
zoo -0.334287
2002-10-30 13:00:00 bar 0.483035
baz 1.573808
foo -0.734948
zoo 0.005149
2002-10-30 14:00:00 bar 0.077451
baz 0.783963
foo -0.111543
zoo -0.637871
2002-10-30 15:00:00 bar 0.383890
baz -0.368546
foo 0.503162
zoo 0.324679
dtype: float64
print (dfmi.index.get_level_values(0))
DatetimeIndex(['1988-01-01 00:00:00', '1988-01-01 00:00:00',
'1988-01-01 00:00:00', '1988-01-01 00:00:00',
'1988-01-01 01:00:00', '1988-01-01 01:00:00',
'1988-01-01 01:00:00', '1988-01-01 01:00:00',
'1988-01-01 02:00:00', '1988-01-01 02:00:00',
...
'2002-10-30 13:00:00', '2002-10-30 13:00:00',
'2002-10-30 14:00:00', '2002-10-30 14:00:00',
'2002-10-30 14:00:00', '2002-10-30 14:00:00',
'2002-10-30 15:00:00', '2002-10-30 15:00:00',
'2002-10-30 15:00:00', '2002-10-30 15:00:00'],
dtype='datetime64[ns]', length=520000, freq='H')
If select by year, month and day, it works nice:
print (dfmi.loc['2001-01-01'])
2001-01-01 00:00:00 bar -0.349633
baz -1.945182
foo 0.422883
zoo -1.183061
2001-01-01 01:00:00 bar -0.014788
baz -0.646235
foo -1.449877
zoo 0.422516
2001-01-01 02:00:00 bar 0.197731
baz -0.307682
foo -1.293725
zoo 0.283953
2001-01-01 03:00:00 bar 0.725005
baz 1.182862
foo -0.349092
zoo 1.354510
2001-01-01 04:00:00 bar -0.516522
baz 0.582963
foo -0.577522
zoo 0.710188
2001-01-01 05:00:00 bar -0.453547
baz -1.054144
foo 0.911428
zoo -0.417231
2001-01-01 06:00:00 bar -1.117797
baz -0.315343
foo 1.441117
zoo 1.135960
2001-01-01 07:00:00 bar -0.607946
baz 1.210047
...
2001-01-01 16:00:00 foo -1.435443
zoo 0.045783
2001-01-01 17:00:00 bar -0.590686
baz 1.115333
foo -0.558769
zoo -0.522037
2001-01-01 18:00:00 bar -0.862442
baz 0.917343
foo 0.759948
zoo -0.004601
2001-01-01 19:00:00 bar -1.901064
baz 0.281404
foo -0.704072
zoo -0.104176
2001-01-01 20:00:00 bar -0.703224
baz -0.170848
foo 0.547919
zoo 1.199431
2001-01-01 21:00:00 bar 1.198095
baz -0.416019
foo 1.551871
zoo -0.472543
2001-01-01 22:00:00 bar -0.239550
baz -0.401000
foo 0.959729
zoo 0.783011
2001-01-01 23:00:00 bar 0.415624
baz 0.093273
foo 0.712913
zoo -1.026016
dtype: float64
But if select by year or year and month, it return all DataFrame, there is not slicing.
print (dfmi.loc['2001-01'])
print (dfmi.loc['2001'])
Is it bug or not implemented yet? Thank you.
print (pd.show_versions())
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: sk_SK
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.1
None
Comment From: jreback
pls check that your output is from running on 0.18.1 as this DOES work with that version.
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:24:55)
Type "copyright", "credits" or "license" for more information.
IPython 5.0.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: pd.__version__
Out[1]: '0.18.1'
In [2]: rng = pd.date_range('01-01-1988',periods=130000,freq='H')
...: long_df = pd.DataFrame(np.random.randn(130000,4),index = rng, columns=['bar','baz','foo','zoo'])
...: dfmi = long_df.stack().sort_index()
...:
In [3]: len(dfmi.loc['2001'])
Out[3]: 398760
In [4]: len(dfmi.loc['2001-01'])
Out[4]: 390744
In [5]: len(dfmi.loc['2001-01-01'])
Out[5]: 96
Comment From: jreback
though you your LANG
setting might not be honoured. Try using that LANG date separators.
Comment From: jesrael
For me it work with len
also nice. But output get all DataFrame
. And it failed only in large DataFrame, in small it works perfectly:
import pandas as pd
import numpy as np
rng = pd.date_range('01-01-1988',periods=130000,freq='H')
long_df = pd.DataFrame(np.random.randn(130000,4),index = rng, columns=['bar','baz','foo','zoo'])
dfmi = long_df.stack().sort_index()
print (len(dfmi.loc['2001']))
398760
print (len(dfmi.loc['2001-01']))
390744
print (len(dfmi.loc['2001-01-01']))
96
print (dfmi.loc['2001'].head(10))
1988-01-01 00:00:00 bar -0.468276
baz 0.236022
foo 1.220370
1988-01-01 01:00:00 bar -0.266689
baz 0.151896
foo 0.162511
1988-01-01 02:00:00 bar -0.046803
baz 0.147449
foo -0.532906
1988-01-01 03:00:00 bar 0.544098
dtype: float64
print (dfmi.loc['2001'].tail(10))
2002-10-30 12:00:00 foo 0.012732
2002-10-30 13:00:00 bar -0.222702
baz 1.099860
foo 0.143408
2002-10-30 14:00:00 bar 1.022347
baz 1.617577
foo -0.061869
2002-10-30 15:00:00 bar 1.037734
baz -0.759772
foo -1.280097
dtype: float64
rng = pd.date_range('01-01-2000',periods=50,freq='M')
long_df = pd.DataFrame(np.random.randn(50,4),index = rng, columns=['bar','baz','foo','zoo'])
dfmi = long_df.stack()
print (dfmi.loc['2001'].head(10))
2001-01-31 bar 0.670179
baz -0.872433
foo 1.790951
zoo -0.359836
2001-02-28 bar -1.130303
baz 1.878084
foo 0.928927
zoo -0.728020
2001-03-31 bar -1.095083
baz -0.338816
dtype: float64
print (dfmi.loc['2001'].tail(10))
2001-10-31 foo 0.451467
zoo 1.488951
2001-11-30 bar 0.820663
baz 1.302047
foo 1.062436
zoo -0.573341
2001-12-31 bar -1.325396
baz 0.428895
foo -0.336822
zoo 0.317958
dtype: float64
Comment From: jesrael
And I try it solve in http://stackoverflow.com/questions/39040445/select-slice-a-multi-index-dataframe-time-series-using-a-period-leads-to-a-bug
Comment From: jreback
this is the same as #12896 and closed by #13117 in master and 0.19.0 (releasing soon)
In [1]: rng = pd.date_range('01-01-1988',periods=130000,freq='H')
...: long_df = pd.DataFrame(np.random.randn(130000,4),index = rng, columns=['bar','baz','foo','zoo'])
...: dfmi = long_df.stack().sort_index()
...:
In [2]: dfmi.loc['2001-01'].head()
Out[2]:
2001-01-01 00:00:00 bar 1.632104
baz -1.039070
foo 1.371071
zoo -0.661709
2001-01-01 01:00:00 bar 1.372920
dtype: float64
In [3]: dfmi.loc['2001-01'].tail()
Out[3]:
2001-01-31 22:00:00 zoo 1.972502
2001-01-31 23:00:00 bar 1.414616
baz 0.248703
foo 1.095618
zoo -1.528967
dtype: float64
In [4]: dfmi.loc['2001-01-01'].head()
Out[4]:
2001-01-01 00:00:00 bar 1.632104
baz -1.039070
foo 1.371071
zoo -0.661709
2001-01-01 01:00:00 bar 1.372920
dtype: float64
In [5]: dfmi.loc['2001-01-01'].tail()
Out[5]:
2001-01-01 22:00:00 zoo 0.109188
2001-01-01 23:00:00 bar 0.531228
baz -0.675307
foo -0.168099
zoo 1.217736
dtype: float64
In [6]: len(dfmi.loc['2001-01'])
Out[6]: 2976