Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
What the bug looks like :
import io
data = """idx;pa
1;0
2;3
3;''
"""
print('default engine\n', pd.read_csv(io.StringIO(data), sep=";", decimal=",", encoding="utf-8"))
returns :
default engine
idx pa
0 1 0
1 2 3
2 3 ''
What I would expect :
data = """idx;pa
1;0
2;3
3;""
"""
print('default engine\n', pd.read_csv(io.StringIO(data), sep=";", decimal=",", encoding="utf-8"))
returns :
default engine
idx pa
0 1 0.0
1 2 3.0
2 3 NaN
Issue Description
I would expect " or ' quote characters to be handled similarly. Confirmed for Pyarrow and regular csv engine
Expected Behavior
Both "" and '' should be read as NaN
idx pa
0 1 0.0
1 2 3.0
2 3 NaN
Installed Versions
Comment From: NumanIjaz
pandas.read_csv takes quotechar
as parameter (defaults to double quote "
). You can change the behavior based on this parameter.
Python doesn't differentiate between single and double quotes, however, the data inside a csv file isn't python code which means we have to differentiate between single and double quotes there.
Comment From: MCRE-BE
hmmm indeed. The quotechar
was probably switched somehow in my file causing the issue.
I didn't think about checking the quotechar
parameter. The file is generated by pandas, so I assumed the quotes where as generated. Probably the issue is there.
Thanks