Using pandas version 0.16.0.
I read a CSV file that has a string column with some missing values, and pandas loads those missing string values as NaN. Pandas won't let me group on that string value, so I'm trying to fillna() with the value "
import pandas
import numpy
#The real use case is reading a CSV with missing values, but this demos the issue
df = pandas.DataFrame({ 'x': ['first', numpy.nan, 'third'], 'y' : [1, 2, 3]})
df #Shows we have a missing value in the string column
#I want to group by 'x'. The following won't work because pandas doesn't
#correctly sum the value for the missing row (it ought to be "2" but shows blank).
#Yes, I understand NaN != NaN, so I get why it fails, but then I need a workaround.
df.groupby('x').sum() #shows blank for the missing value, instead of "2".
#OK, so I'll fillna the missing value and then group.
df['x'] = df['x'].fillna('<blank>', inplace=True)
df #BAD! That overwrote all my good string values!!!
Since the column was already an object (with just strings) and I'm fillna with a string, it should not have to drop values.
pandas.show_versions()
INSTALLED VERSIONS
commit: None python: 3.4.3.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None
pandas: 0.16.0 nose: None Cython: None numpy: 1.9.2 scipy: 0.15.1 statsmodels: 0.6.1 IPython: 5.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.4.2 pytz: 2015.2 bottleneck: None tables: None numexpr: 2.4.3 matplotlib: 1.4.3 openpyxl: 2.2.2 xlrd: 0.9.3 xlwt: None xlsxwriter: 0.7.2 lxml: None bs4: 4.3.2 html5lib: None httplib2: 0.9.1 apiclient: None sqlalchemy: 1.0.3 pymysql: None psycopg2: None
Comment From: TomAugspurger
The problem is with the inplace=True
part here.
df['x'] = df['x'].fillna('<blank>', inplace=True)
Since you're doing inplace, that returns None, and you assign df['x']
to None
In [9]: result = df.fillna('<blank>', inplace=True)
In [10]: result
If you remove the inplace, you should be good.
Comment From: darindillon
Thanks, my mistake.