Code Sample
import pandas as pd
import numpy as np
iinput = np.arange(10.0)
iinput[5] = np.nan
x = pd.Series(iinput).rolling(3).apply(lambda x: 1.0).tolist()
print(x)
Expected Output and Problem Description
One would expect that the above code would print the list:
[nan, nan, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Instead, we get the following:
[nan, nan, 1.0, 1.0, 1.0, nan, nan, nan, 1.0, 1.0]
It seems that any time the input to lambda contains nan, then nan is returned automatically. This is problematic, because it is not possible to apply a custom rolling function to a series containing nans. Doing so will return a result riddled with more nans.
Additionally, this behavior exists exclusively for rolling()
. Running the above code with expanding()
works just fine.
Output of pd.show_versions()
Comment From: jreback
min_periods
controls whether the window is skipped or not. it defaults to the window size.
In [7]: iinput = np.arange(10.0)
...: iinput[5] = np.nan
...: iinput = pd.Series(iinput)
...: x = iinput.rolling(3, min_periods=1).apply(lambda x: 1.0)
...: x
...:
Out[7]:
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 1.0
7 1.0
8 1.0
9 1.0
dtype: float64
In [8]: iinput = np.arange(10.0)
...: iinput[5] = np.nan
...: iinput = pd.Series(iinput)
...: x = iinput.rolling(3, min_periods=3).apply(lambda x: 1.0)
...: x
...:
Out[8]:
0 NaN
1 NaN
2 1.0
3 1.0
4 1.0
5 NaN
6 NaN
7 NaN
8 1.0
9 1.0
dtype: float64
Comment From: Anisalexvl
Nevertheless, if we initialize next array
array([ 0., 1., 2., 3., 4., nan, nan, nan, 8., 9.])
and implement next code
def f(x):
print(x)
return 1.
x = pd.Series(iinput).rolling(3, min_periods=1).apply(f).tolist()
print('Rolling result: ', x)
gives output
[ 0.]
[ 0. 1.]
[ 0. 1. 2.]
[ 1. 2. 3.]
[ 2. 3. 4.]
[ 3. 4. nan]
[ 4. nan nan]
[ nan nan 8.]
[ nan 8. 9.]
Rolling result: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, nan, 1.0, 1.0]
which means that function was not applied to values, that fully filled with nans. Expected that function should apply to every value and just want to agree with @scoliann:
It seems that any time the input to lambda contains nan, then nan is returned automatically. This is problematic, because it is not possible to apply a custom rolling function to a series containing nans.