Hello everyone,
I would like to submit one very useful addition to rolling, namely the possibility to compute any statistics over a specific time range.
Indeed, my understanding is that rolling(windows=5).mean()
computes, say, the mean over the last five observations.
Instead, it would be very useful to specify something like `rolling(windows=5,type_windows='time_range').mean() to get the rolling mean over the last 5 days.
So if your data starts on January 1 and then the next data point is on Feb 2nd, then the rolling mean for the Feb 2nb point is NA because there was no data on Jan 29, 30, 31, Feb 1, Feb 2.
I believe this would be very useful in settings where data represents trading data, so most of the time the data points are not equidistant in time. Still, you want to compute rolling metrics that are specified over the same delta.
What do you think?
Thanks!
Comment From: jreback
you can simply resample first to get your desired results (specifying freq to .rolling does this now as well)
Comment From: randomgambit
I see, thanks.
But do you know what's wrong here, then?
df=pd.DataFrame({'time': ['2015/01/01', '2015/02/01', '2016/02/02'],
'myvar' : [2,2,2],
'group' : ['jeff', 'olaf', 'jeff']})
df['time']=pd.to_datetime(df.time)
df.set_index('time',inplace=True)
df.groupby('group').apply(lambda x: x.rolling(window=2,freq='D').count())
ValueError: could not convert string to float: jeff.
I am using Pandas 18.0
Comment From: jreback
You need to ATM specify the column to work on as the non-numeric columns don't work very well (e.g. the grouper in this case).
# < 0.18.1
In [14]: df.groupby('group').myvar.apply(lambda x: x.rolling(2,freq='D').count())
# 0.18.1
In [15]: df.groupby('group').myvar.rolling(2,freq='D').count())
Out[15]:
group time
jeff 2015-01-01 1.0
2015-01-02 1.0
2015-01-03 0.0
2015-01-04 0.0
2015-01-05 0.0
...
2016-01-30 0.0
2016-01-31 0.0
2016-02-01 0.0
2016-02-02 1.0
olaf 2015-02-01 1.0
Name: myvar, dtype: float64
so this is a dupe of #12537
Comment From: randomgambit
thanks jeff. Thats interesting and I would say its a rather sublte bug because I would not think of the grouper as a regular data column that gets processed by what follows groupby
(count()
in this case).
Comment From: chrisaycock
It's probably a whole different issue, but would it ever be possible to specify the column for .rolling()
? That is, instead of using the DataFrame's index, let the user explicitly list a column:
df.rolling('5s', col='time')
Definitely don't let my request hold-up this code change. We can worry about that later; getting windows by timestamp is far more important at this stage.
Comment From: jreback
that's actually very easy
Comment From: chrisaycock
Oh, if you're up for it, then can we add that to this feature request? It would be really helpful.
Comment From: jreback
yes will put it on the list -
Comment From: BenjaminHabert
Very excited about this new feature! Thanks !
Comment From: vciulei
Hello, is this capability currently supported? I am doing a simple df.rolling("1S")
and am getting the same 'Window must be an integer'
Error