I was wondering if, given the recent set of developments and improvements to asfreq
and resample
, we now have a more efficient method for solving this problem [from SO].
Example:
item_uid created_at value
0S0099v8iI 2015-03-25 10652.79
0F01ddgkRa 2015-03-25 1414.71
0F02BZeTr6 2015-03-20 51505.22
0F02BZeTr6 2015-03-23 51837.97
0F02BZeTr6 2015-03-24 51578.63
0F02BZeTr6 2015-03-25 NaN
0F02BZeTr6 2015-03-26 NaN
0F02BZeTr6 2015-03-27 50893.42
0F02BcIzNo 2015-03-17 1230.00
0F02BcIzNo 2015-03-23 1130.00
0F02F4gAMs 2015-03-25 1855.96
0F02Vwd6Ou 2015-03-19 5709.33
0F04OlAs0R 2015-03-18 321.44
0F05GInfPa 2015-03-16 664.68
0F05PQARFJ 2015-03-18 1074.31
0F05PQARFJ 2015-03-26 1098.31
0F06LFhBCK 2015-03-18 211.49
0F06ryso80 2015-03-16 13.73
0F06ryso80 2015-03-20 12.00
0F07gg7Oth 2015-03-19 2325.70
which, if we were to resample with a daily frequency, would return (e.g. for item_uid = 0F02BZeTr6
):
item_uid created_at value
0F02BZeTr6 2015-03-20 51505.22
0F02BZeTr6 2015-03-21 51505.22
0F02BZeTr6 2015-03-22 51505.22
0F02BZeTr6 2015-03-23 51837.97
0F02BZeTr6 2015-03-24 51578.63
0F02BZeTr6 2015-03-25 51578.63
0F02BZeTr6 2015-03-26 51578.63
0F02BZeTr6 2015-03-27 50893.42
0F02BZeTr6 2015-03-28 50893.42
0F02BZeTr6 2015-03-29 50893.42
I mention this partly because unstacking
can be very slow when our index is huge. Any thoughts here are welcome!
Comment From: jreback
can u provide a copy pastable example for your frame
Comment From: amelio-vazquez-reina
Thanks @jreback I updated the examples to make them easy to copy using pd.read_clipboard()
. Note that example uses dates and assumes a daily sampling frequency, but I am hoping to work with datetimes (not just dates) and other frequencies.
Comment From: jreback
in master
In [15]: df.dtypes
Out[15]:
item_uid object
created_at datetime64[ns]
value float64
dtype: object
In [16]: df.set_index('created_at').groupby('item_uid').resample('D').ffill()
Out[16]:
item_uid value
item_uid created_at
0F01ddgkRa 2015-03-25 0F01ddgkRa 1414.71
0F02BZeTr6 2015-03-20 0F02BZeTr6 51505.22
2015-03-21 0F02BZeTr6 51505.22
2015-03-22 0F02BZeTr6 51505.22
2015-03-23 0F02BZeTr6 51837.97
2015-03-24 0F02BZeTr6 51578.63
2015-03-25 0F02BZeTr6 NaN
2015-03-26 0F02BZeTr6 NaN
2015-03-27 0F02BZeTr6 50893.42
0F02BcIzNo 2015-03-17 0F02BcIzNo 1230.00
2015-03-18 0F02BcIzNo 1230.00
2015-03-19 0F02BcIzNo 1230.00
2015-03-20 0F02BcIzNo 1230.00
2015-03-21 0F02BcIzNo 1230.00
2015-03-22 0F02BcIzNo 1230.00
2015-03-23 0F02BcIzNo 1130.00
0F02F4gAMs 2015-03-25 0F02F4gAMs 1855.96
0F02Vwd6Ou 2015-03-19 0F02Vwd6Ou 5709.33
0F04OlAs0R 2015-03-18 0F04OlAs0R 321.44
0F05GInfPa 2015-03-16 0F05GInfPa 664.68
0F05PQARFJ 2015-03-18 0F05PQARFJ 1074.31
2015-03-19 0F05PQARFJ 1074.31
2015-03-20 0F05PQARFJ 1074.31
2015-03-21 0F05PQARFJ 1074.31
2015-03-22 0F05PQARFJ 1074.31
2015-03-23 0F05PQARFJ 1074.31
2015-03-24 0F05PQARFJ 1074.31
2015-03-25 0F05PQARFJ 1074.31
2015-03-26 0F05PQARFJ 1098.31
0F06LFhBCK 2015-03-18 0F06LFhBCK 211.49
0F06ryso80 2015-03-16 0F06ryso80 13.73
2015-03-17 0F06ryso80 13.73
2015-03-18 0F06ryso80 13.73
2015-03-19 0F06ryso80 13.73
2015-03-20 0F06ryso80 12.00
0F07gg7Oth 2015-03-19 0F07gg7Oth 2325.70
0S0099v8iI 2015-03-25 0S0099v8iI 10652.79
Comment From: Hvass-Labs
@jreback I know it has been 3 years since you closed this, but I have to resample a MultiIndex DataFrame like you have done above, and I am getting similar output as you show above. Note in your example how item_uid
is now both in the index and duplicated in a separate column of the DataFrame. I wonder if this is a bug? Or is there a good reason for it? Is there a way to avoid this? Thanks!