Code Sample, a copy-pastable example if possible
df = pd.DataFrame({'A':[math.nan,1,math.nan], 'B':[math.nan,math.nan,1], 'C': [math.nan,math.nan,math.nan]})
print(df)
print(df.sum(axis=0, skipna=True))
print(df.sum(axis=1, skipna=True))
Problem description
Currently, when summing over columns/rows containing all NaNs, a result of 0.0 is returned. This contradicts the documentation which states, when skipna is True (the default), "Exclude NA/null values. If an entire row/column is NA, the result will be NA."
I don't have a strong opinion on whether the documented or actual behaviour is better; if existing code depends on this behaviour, it may best to update the documentation.
Actual output
A B C
0 NaN NaN NaN
1 1.0 NaN NaN
2 NaN 1.0 NaN
A 1.0
B 1.0
C 0.0
dtype: float64
0 0.0
1 1.0
2 1.0
dtype: float64
Expected Output
A B C
0 NaN NaN NaN
1 1.0 NaN NaN
2 NaN 1.0 NaN
A 1.0
B 1.0
C NaN
dtype: float64
0 NaN
1 1.0
2 1.0
dtype: float64
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.3-101.fc21.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.1.0
tables: None
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.2.0-b1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: None
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
Comment From: jorisvandenbossche
@AdamGleave Thanks for the report. This is a known issue (summary is: the behaviour depends on whether bottleneck is installed or not), that we should really decide upon what to do with it. See https://github.com/pandas-dev/pandas/issues/9422 for the discussion.