Using 0.14.0. pandas.io.parsers.read_csv is supposed to ignore blank-looking values if na_filter=False, but it does not do this for index_col columns.

foo.csv:

fruit,size,sugar
apples,medium,2
pear,medium,3
grape,small,4
durian,,1

The default behavior gives a dataframe with a NaN in place of the empty value from this last row:

df = pd.io.parsers.read_csv("foo.csv")

This gives the same dataframe with a blank string instead of a NaN. So far so good:

df = pd.io.parsers.read_csv("foo.csv", na_filter=False)

My expectation was that this next version would give a dataframe with no NaN values in the index, but it does not:

df = pd.io.parsers.read_csv("foo.csv", index_col=['fruit','size'], na_filter=False)
print df
=>                sugar
   fruit  size         
   apples medium      2
   pear   medium      3
   grape  small       4
   durian NaN         1

Because it unexpectedly includes NaNs, I've been fighting with issue 4862 in unstack for hours :-(.

In order to get the desired behavior, a DF with no NaNs in the index, I have to read the data without a multi-index, then set_index afterwards:

df = pd.io.parsers.read_csv("foo.csv", na_filter=False)
df.set_index(['fruit','size'])

As a temporary fix, perhaps the documentation ought to clarify the behavior of na_filter with respect to index_col.

Comment From: jreback

I'll mark it as a bug, but the 2nd soln looks fine to me. Trying to have the parser do too much is in general a problem IMHO.

Comment From: dlenski

@jreback, the parser already knows how to distinguish NaNs, or not to distinguish them, right? Isn't that what na_filter is for?

The obvious user expectation is that index_col should have the same effect as calling set_index afterwards. The fact that it interacts with the behavior of na_filter is both surprising (at odds with the reasonable expected behavior) and unmentioned in the docs.

Comment From: jreback

I marked it as a bug. You are welcome to do a pull-request. My point was that their are close to 50 options for the parser, so their are obviously some untested paths.

Comment From: tommycarstensen

This bug has been fixed and the issue can be closed.

Comment From: jreback

@gfyoung do we have a test for this?

Comment From: gfyoung

This is a dupe of #5239. Closed by #18127 (so yes, there is a test).