With a DatetimeIndex I can specify the rolling window using an offset alias, but if I want to skip the first (incomplete) window, I would need to calculate the number of periods in the window. Rolling is hence using different units for the window and min_period functionality. I would like to be able to skip all periods in the first n windows.
Possible implementations could be: min_periods='7D' min_periods=window skip_windows=n skip_window=True.
n=1 is probably good enough for most use cases.
#### Code Sample, a copy-pastable example if possible
# Generate example dataframe:
idx = pd.date_range("2019-03-01", periods=10000, freq='5T')
df = pd.DataFrame(np.sin(np.arange(0,100,0.01)), index=idx)
# Plot data
plt.plot(df)
# Plot rolling mean with vertical offset for visual separation
plt.plot(df.rolling('7D').mean() + 0.2)
# Plot rolling mean with time offset equal to 1 window
periods = pd.to_timedelta('7D')//df.index.freq
plt.plot(df.rolling('7D', min_periods=periods).mean())
plt.show()
Problem description
'min_periods' accepts only integer values. A min_periods value less than the number of periods in the window is not representative as there are too few observations. The documentation is very confusing with respect to time series since the "offset" apparently does not refer to an offset alias: "For a window that is specified by an offset, min_periods will default to 1. Otherwise, min_periods will default to the size of the window."
Comment From: mroeschke
Thanks for the suggestion, but I believe this can be done by defining a custom BaseIndexer
subclass to skip a particular window so I don't think this is likely to be added directly. Closing