Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [0, 1, 2, np.nan, 4],
'B': [0, 1, 2, np.nan, 4],
'C': [0, 1, 2, np.nan, 4],
'D': [0, 1, 2, np.nan, 4],
'E': [0, 1, 2, np.nan, 4],
'F': [0, 1, 2, np.nan, 4]})
print df.expanding(axis=1).sum()
Expected Output
A B C D E F
0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 2.0 3.0 4.0 5.0 5.0
2 2.0 4.0 6.0 8.0 10.0 10.0
3 NaN NaN NaN NaN NaN NaN
4 4.0 8.0 12.0 16.0 20.0 20.0
However, the correct result should be:
A B C D E F
0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 2.0 3.0 4.0 5.0 6.0
2 2.0 4.0 6.0 8.0 10.0 12.0
3 NaN NaN NaN NaN NaN NaN
4 4.0 8.0 12.0 16.0 20.0 24.0
Notice that the last column E
is different. I've tracked this down and found that the _get_window function (for expanding) fails to return the correct number of windows when the following conditions are met:
1. axis=1
is used instead of axis=0
(default)
2. The number of rows in the dataframe is less than the number of columns
This is caused by the fact that the object is using len(obj)
in determining the window size. Instead, it should be using obj.shape[self.axis]
output of pd.show_versions()
commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.18.1+237.ge357ea1 nose: 1.3.7 pip: 8.1.2 setuptools: 20.1.1 Cython: 0.23.4 numpy: 1.11.1 scipy: 0.17.1 statsmodels: 0.6.1 xarray: 0.7.0 IPython: 4.0.3 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.4.1 pytz: 2015.7 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.6.0 matplotlib: None openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.5.0 bs4: 4.4.1 html5lib: None httplib2: 0.9 apiclient: 1.4.0 sqlalchemy: 1.0.11 pymysql: None psycopg2: 2.6.1 (dt dec pq3 ext) jinja2: 2.8 boto: 2.39.0 pandas_datareader: 0.2.0
Comment From: jreback
this is a sympton of #13503 , essentially axis=1 is broken (expanding is just a sub-class of rolling). Certainly appreciate a PR to address that.
Comment From: seanlaw
yes, I will submit a pull request to fix expanding
On Fri, Jul 22, 2016 at 1:07 PM, Jeff Reback notifications@github.com wrote:
this is a sympton of #13503 https://github.com/pydata/pandas/issues/13503 , essentially axis=1 is broken (expanding is just a sub-class of rolling). Certainly appreciate a PR to address that.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/pandas/issues/13753#issuecomment-234599757, or mute the thread https://github.com/notifications/unsubscribe-auth/AHIJcezIpdbeKbdk6Gj4Bi4IlriiXv3kks5qYPjAgaJpZM4JSpt3 .
Comment From: seanlaw
I tried to look at how to fix rolling but couldn't figure out where or why self.axis
was being overwritten/ignored. self.axis
is present when the object is instantiated via _Window
but the functions within _Rolling
don't seem to inherit this attribute