Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
What the bug looks like :
import io
data = """idx;pa
1;0
2;3
3;''
"""
print('default engine\n', pd.read_csv(io.StringIO(data), sep=";", decimal=",", encoding="utf-8"))
returns :
default engine
idx pa
0 1 0
1 2 3
2 3 ''
What I would expect :
data = """idx;pa
1;0
2;3
3;""
"""
print('default engine\n', pd.read_csv(io.StringIO(data), sep=";", decimal=",", encoding="utf-8"))
returns :
default engine
idx pa
0 1 0.0
1 2 3.0
2 3 NaN
Issue Description
I would expect " or ' quote characters to be handled similarly. Confirmed for Pyarrow and regular csv engine
Expected Behavior
Both "" and '' should be read as NaN
idx pa
0 1 0.0
1 2 3.0
2 3 NaN
Installed Versions
Comment From: NumanIjaz
pandas.read_csv takes quotechar as parameter (defaults to double quote "). You can change the behavior based on this parameter.
Python doesn't differentiate between single and double quotes, however, the data inside a csv file isn't python code which means we have to differentiate between single and double quotes there.
Comment From: MCRE-BE
hmmm indeed. The quotechar was probably switched somehow in my file causing the issue.
I didn't think about checking the quotechar parameter. The file is generated by pandas, so I assumed the quotes where as generated. Probably the issue is there.
Thanks