http://stackoverflow.com/questions/13030245/how-to-shift-a-pandas-multiindex-series
Comment From: changhiskhan
Can potentially add a bunch of ts related methods for multiindex. Resample, asfreq, at_time, between_time, etc
Comment From: kawing-chiu
So, before these new methods arrive, how can we perform these operations? For example, I have a DatetimeIndex as level 0 of a MultiIndex, how can I use between_time() on it?
Comment From: jreback
@kawing-chiu pls show what you are trying to do.
Comment From: kawing-chiu
I have a multi-indexed dataframe like this:
In [67]: df.head()
Out[67]:
price volume
time stock
2015-12-01 09:30:00 000001 18.34 1000
000001 18.34 600
000001 18.35 5000
000001 18.35 6300
000001 18.35 900
What I'm trying to do is to select an intraday time range (say 09:30:00 to 09:45:00) for all stocks in all dates. This is typically done by using between_time()
when you have a single DatatimeIndex
instead of a MultiIndex
.
Comment From: jreback
you can easily do it like this
In [15]: index = pd.MultiIndex.from_product([pd.date_range('20130101',periods=1000,freq='H'),['one','two']])
In [16]: df = pd.DataFrame({'A' : np.random.randn(len(index)), 'B' : np.arange(len(index))}, index=index)
In [17]: df.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 2000 entries, (2013-01-01 00:00:00, one) to (2013-02-11 15:00:00, two)
Data columns (total 2 columns):
A 2000 non-null float64
B 2000 non-null int64
dtypes: float64(1), int64(1)
memory usage: 46.9+ KB
In [18]: indexer = df.index.get_level_values(0).indexer_between_time('09:00','16:00')
In [19]: df.iloc[indexer].info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 670 entries, (2013-01-01 09:00:00, one) to (2013-02-11 15:00:00, two)
Data columns (total 2 columns):
A 670 non-null float64
B 670 non-null int64
dtypes: float64(1), int64(1)
memory usage: 15.7+ KB
the enhancement would be to allow a level=
parameters to .between_time
to just do this selection internally.
Comment From: kawing-chiu
Thanks @jreback . It works.
I have a further suggestion on datetime indexing: Even if we have between_time()
, I still find it quite cumbersome to select a time range considering that when you are working with intraday data, you will use it almost everywhere. It will be much better to have df['09:30:45':'10:25:00']
work directly on a datetime index. Generally I think not too many people would use ':' as a separator for date, right? So its a reasonable default to select time when only ':' separator is present.
Comment From: jreback
@kawing-chiu can you make a new issue w.r.t. using a time string as an indexer.
and of course PR's are welcome for this and other issues!
Comment From: MarcoGorelli
closing as there's been no activity in ~7 years, and the workaround in https://github.com/pandas-dev/pandas/issues/2141#issuecomment-172353607 looks fine