Code Sample
Make data
>>> dates = pd.date_range('01-01-2015','01-07-2015', freq='B')
>>> df = pd.DataFrame({'a':[1,2,np.nan,4,5],'b':[6,7,8,9,np.nan]}, dates)
>>> print(df)
a b
2015-01-01 1.0 6.0
2015-01-02 2.0 7.0
2015-01-05 NaN 8.0
2015-01-06 4.0 9.0
2015-01-07 5.0 NaN
Then re-sample
>>> df = df.resample('D').fillna(method='ffill')
>>> print(df)
a b
2015-01-01 1.0 6.0
2015-01-02 2.0 7.0
2015-01-03 2.0 7.0
2015-01-04 2.0 7.0
2015-01-05 NaN 8.0
2015-01-06 4.0 9.0
2015-01-07 5.0 NaN
Expected Output
fillna()
should have filled all of the NaNs
, but instead it seems to miss those that existed before resample()
was called.
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 15 Stepping 11, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 22.0.5
Cython: 0.23.1
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 3.2.1
sphinx: None
patsy: 0.3.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.7.3
boto: None
pandas_datareader: None
Comment From: dirkbike
I think this is a result of resample()
being a deferred operation. Attaching mean()
(with no arguments) after resample resolves this issue, but that seems like a silly workaround.
Comment From: max-sixty
The first ffill
only fills the resample operation:
In [20]: df.resample('D').ffill()
Out[20]:
a b
2015-01-01 1.0 6.0
2015-01-02 2.0 7.0
2015-01-03 2.0 7.0
2015-01-04 2.0 7.0
2015-01-05 NaN 8.0
2015-01-06 4.0 9.0
2015-01-07 5.0 NaN
Then if you want to forward fill the resample result, you can ffill
again:
In [21]: df.resample('D').ffill().ffill()
Out[21]:
a b
2015-01-01 1.0 6.0
2015-01-02 2.0 7.0
2015-01-03 2.0 7.0
2015-01-04 2.0 7.0
2015-01-05 2.0 8.0
2015-01-06 4.0 9.0
2015-01-07 5.0 9.0
Comment From: jreback
@dirkbike
this is the definition of filling. You forward fill values, whether they are NaN
or not. You are introducing new values in the periods that are created which by definition are missing and hence need filling. The fact that you have an existing missing values is irrelevant.
Comment From: jreback
You can also use .asfreq()
to see this. This is basically a reindexing operation to make new values, then you fill.
In [4]: df.resample('D').asfreq()
Out[4]:
a b
2015-01-01 1.0 6.0
2015-01-02 2.0 7.0
2015-01-03 NaN NaN
2015-01-04 NaN NaN
2015-01-05 NaN 8.0
2015-01-06 4.0 9.0
2015-01-07 5.0 NaN
In [5]: df.resample('D').asfreq().ffill()
Out[5]:
a b
2015-01-01 1.0 6.0
2015-01-02 2.0 7.0
2015-01-03 2.0 7.0
2015-01-04 2.0 7.0
2015-01-05 2.0 8.0
2015-01-06 4.0 9.0
2015-01-07 5.0 9.0