from mailing list
I have TimeSeries that has observations at the second level:
mport pandas as pd
import numpy as np
rng = pd.date_range('2012-01-01', '2012-01-10', freq='T')
s = pd.TimeSeries(np.random.randn(12961)*100, index=rng)
I now want to resample the data to daily at 17:59:59, so that I have the sum of positions at 17:59:59 for all days in range.
If I didn't care about the time I could just do:
s1 = s.resample('D',how='sum')
Does anyone have a suggestion?```
Comment From: wesm
user also posted:
Hello All,
Thank you for your replies, after a day of fiddling around with pandas and moving up the ranks from newbie to conversant, I've figured out how do what I want to do. First of all my first example was wrong, I was not trying to figure out the sum of my positions at a given Frequency at a specific time, rather I was trying to determine my actual position at a given frequency for a given time. In the end, the code that I needed to write looked like this:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import pytz
positions = [10,100,30,150,170,12,90,100,30]
timestamps = pd.DatetimeIndex(['2012-01-01 09:10:00','2012-01-01 14:10:00','2012-01-01 17:15:00','2012-01-02 09:11:00','2012-01-02 12:10:00','2012-01-02 17:59:00','2012-01-02 18:10:00','2012-01-03 05:10:00','2012-01-03 18:10:00'])
symbols = ['SDF','SDF','SDF','HS','SDF','HS','SDF','HS','SDF']
df = pd.DataFrame(zip(timestamps,positions,symbols),columns=['timestamp','position','symbol'],index=timestamps)
start = df.first_valid_index().replace(hour=17,minute=59,second=59)
end = df.last_valid_index().replace(hour=17,minute=59, second=59)
selection = pd.date_range(start, end, freq='D')
# positions_at_close = df['position'].asof(selection)
df2 = pd.DataFrame()
g = df.groupby('symbol')
for key, data in g:
df2 = df2.append(pd.concat([data['position'].asof(selection),data['symbol'].asof(selection)], axis=1, keys=['position','symbol']))
Which took the DataFrame df:
In [574]: df
Out[574]:
timestamp position symbol
2012-01-01 09:10:00 2012-01-01 09:10:00 10 SDF
2012-01-01 14:10:00 2012-01-01 14:10:00 100 SDF
2012-01-01 17:15:00 2012-01-01 17:15:00 30 SDF
2012-01-02 09:11:00 2012-01-02 09:11:00 150 HS
2012-01-02 12:10:00 2012-01-02 12:10:00 170 SDF
2012-01-02 17:59:00 2012-01-02 17:59:00 12 HS
2012-01-02 18:10:00 2012-01-02 18:10:00 90 SDF
2012-01-03 05:10:00 2012-01-03 05:10:00 100 HS
2012-01-03 18:10:00 2012-01-03 18:10:00 30 SDF
To df2:
In [575]: df2
Out[575]:
position symbol
2012-01-01 17:59:59 NaN NaN
2012-01-02 17:59:59 12 HS
2012-01-03 17:59:59 100 HS
2012-01-01 17:59:59 30 SDF
2012-01-02 17:59:59 170 SDF
2012-01-03 17:59:59 90 SDF
There is probably a more elegant way to do this, but I this works.
Comment From: aabdullah-bos
Hello Wes,
Thanks for posting this. I just wanted to make one small change, to the code above. The selection needs to made inside the loop over the groups, so that you don't get the "as of" date for dates when data shouldn't exist.
for key, data in g: start_rng = data.first_valid_index().replace(hour=17,minute=59,second=59) end_rng = data.last_valid_index().replace(hour=17,minute=59,second=59) selection = pd.date_range(start_rng,end_rng, freq='D') df2 = df2.append(pd.concat([data['position'].asof(selection),data['symbol'].asof(selection)], axis=1, keys=['position','symbol']))
Comment From: hayd
This seems like something that we should use the base
argument of resample for.
Comment From: MarcoGorelli
closing as there's been no activity in over a decade, and it's not that clear what's being asked for