Using 0.14.0. pandas.io.parsers.read_csv
is supposed to ignore blank-looking values if na_filter=False
, but it does not do this for index_col
columns.
foo.csv:
fruit,size,sugar
apples,medium,2
pear,medium,3
grape,small,4
durian,,1
The default behavior gives a dataframe with a NaN in place of the empty value from this last row:
df = pd.io.parsers.read_csv("foo.csv")
This gives the same dataframe with a blank string instead of a NaN. So far so good:
df = pd.io.parsers.read_csv("foo.csv", na_filter=False)
My expectation was that this next version would give a dataframe with no NaN values in the index, but it does not:
df = pd.io.parsers.read_csv("foo.csv", index_col=['fruit','size'], na_filter=False)
print df
=> sugar
fruit size
apples medium 2
pear medium 3
grape small 4
durian NaN 1
Because it unexpectedly includes NaNs, I've been fighting with issue 4862 in unstack
for hours :-(.
In order to get the desired behavior, a DF with no NaNs in the index, I have to read the data without a multi-index, then set_index
afterwards:
df = pd.io.parsers.read_csv("foo.csv", na_filter=False)
df.set_index(['fruit','size'])
As a temporary fix, perhaps the documentation ought to clarify the behavior of na_filter
with respect to index_col
.
Comment From: jreback
I'll mark it as a bug, but the 2nd soln looks fine to me. Trying to have the parser do too much is in general a problem IMHO.
Comment From: dlenski
@jreback, the parser already knows how to distinguish NaNs, or not to distinguish them, right? Isn't that what na_filter
is for?
The obvious user expectation is that index_col
should have the same effect as calling set_index
afterwards. The fact that it interacts with the behavior of na_filter
is both surprising (at odds with the reasonable expected behavior) and unmentioned in the docs.
Comment From: jreback
I marked it as a bug. You are welcome to do a pull-request. My point was that their are close to 50 options for the parser, so their are obviously some untested paths.
Comment From: tommycarstensen
This bug has been fixed and the issue can be closed.
Comment From: jreback
@gfyoung do we have a test for this?
Comment From: gfyoung
This is a dupe of #5239. Closed by #18127 (so yes, there is a test).