Code Sample, a copy-pastable example if possible

from pandas.compat import StringIO
import pandas as pd

t1 = """float
1
"""
t2 = """float
NaN
"""

for t in t1, t2:
    df = pd.read_csv(StringIO(t), dtype={'float': 'str'})
    print(type(df['float'][0]))

Problem description

Even when explicitly specifying dtype above, read_csv still converts values in the float column to a float when the string is "NaN". This behavior appears to be limited to "NaN" as it doesn't happen for regular numbers. Still, I unexpectedly ran across a "NaN" string in my application so it's blocking me.

Comment From: gfyoung

So at the surface, pandas is actually respecting your wishes, as the dtype of the column is object. However, it just so happens that the element contained in that column is not converted to a str. That being said, I think it should convert to a string (the NaN itself). PR to patch is welcome!

Comment From: giba0

@gfyoung A newbie here! I would like to help with this problem! Any tips on where to start?

Comment From: gfyoung

@gilbertoolimpio : Thanks for volunteering! So read_csv is a somewhat tricky / convoluted function, but it's do-able to understand once you've worked with for long enough.

Head over to parsers.pyx. That's where the bulk of the processing for the C engine is done. Look at the _read_rows method. There, you should be able to trace when we have finished reading the data and when we begin to transform data types of columns.

Comment From: giba0

Ok @gfyoung! Thanks!

Comment From: giba0

@jamesqo and @gfyoung the expected result would be and not ok?

Comment From: gfyoung

Yes, that's correct!

Comment From: giba0

For now you may want to use the solution below while not finding a solution.

df = pd.read_csv(StringIO(t), dtype={'float': 'str'}, engine='python')

@gfyoung I'm having trouble debugging the files pyx could you give me a hint?

Comment From: gfyoung

Debugging the files pulls? What do you mean by that?

Comment From: giba0

@gfyoung I'm sorry, I meant pyx, my spell-checker, made me fail!

Comment From: gfyoung

@gilbertoolimpio : Hahaha, got it. Debugging pyx files is kind of annoying, so I generally just add a ton of print statements wherever I can.

Comment From: jreback

this is a duplicate of #15669 which needs discussion.

Comment From: jreback

since NaN are by definition the missing value marker this is correct. you can comment on #15669 if you would like.

Comment From: jamesqo

@jreback The value isn't missing, it's the string "NaN".

Comment From: gfyoung

If you read the first part of the issue in #15669, I can see why it's considered a duplicate.