Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'A' : 1.,
'B' : [2, 2, 4, 4],
'C' : ['T1', 'T2', 'T1', 'T2'],
'D' : 3.,
'E' : np.nan})
#df
tablea = pd.pivot_table(df, values='D', index=['A', 'B'],
columns=['C'], aggfunc=np.sum)
#tablea
tableb = pd.pivot_table(df, values='D', index=['E', 'B'],
columns=['C'], aggfunc=np.sum)
#tableb
Problem description
One is not able to use np.nan as a pivot value.
The ouput for the first pivot table (tablea) as expected is
C T1 T2
A B
1.0 2 3.0 3.0
4 3.0 3.0
The output for the second table (tableb) is which is completely blank.
E | B
-- | --
Expected Output
I was hoping for
C T1 T2
E B
np.nan 2 3.0 3.0
4 3.0 3.0
This behavior is not necessarily a problem but I'm just trying to get clarity on if this is expected.
I know that groupby
excludes NaN
by design and I expect that this is related to that. Is that the case?
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: 3.2.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.26.1
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.5.0a3
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.11
pymysql: 0.7.9.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
Comment From: chris-b1
Thanks - this is a duplicate of #3729 - same underlying issue as groupby as you noted.
There was a PR working this at #12607 - you're welcome to pick it up if interested!