Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

What the bug looks like :

import io
data = """idx;pa
1;0
2;3
3;''
"""

print('default engine\n', pd.read_csv(io.StringIO(data), sep=";", decimal=",", encoding="utf-8"))

returns :

default engine
    idx  pa
0    1   0
1    2   3
2    3  ''

What I would expect :

data = """idx;pa
1;0
2;3
3;""
"""
print('default engine\n', pd.read_csv(io.StringIO(data), sep=";", decimal=",", encoding="utf-8"))

returns :

default engine
    idx   pa
0    1  0.0
1    2  3.0
2    3  NaN

Issue Description

I would expect " or ' quote characters to be handled similarly. Confirmed for Pyarrow and regular csv engine

Expected Behavior

Both "" and '' should be read as NaN

    idx   pa
0    1  0.0
1    2  3.0
2    3  NaN

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.10.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : AMD64 Family 23 Model 24 Stepping 1, AuthenticAMD byteorder : little LC_ALL : None LANG : None LOCALE : English_Netherlands.1252 pandas : 1.5.3 numpy : 1.23.5 pytz : 2023.3 dateutil : 2.8.2 setuptools : 67.6.1 pip : 23.0.1 Cython : None pytest : 7.2.2 hypothesis : None sphinx : 6.1.3 blosc : None feather : None xlsxwriter : 3.0.9 lxml.etree : 4.9.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.12.0 pandas_datareader: None bs4 : 4.12.1 bottleneck : None brotli : fastparquet : None fsspec : 2023.3.0 gcsfs : None matplotlib : None numba : 0.56.4 numexpr : None odfpy : None openpyxl : 3.1.0 pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : 1.0.10 s3fs : None scipy : 1.10.1 snappy : None sqlalchemy : 2.0.9 tables : None tabulate : 0.9.0 xarray : 2023.3.0 xlrd : 2.0.1 xlwt : None zstandard : 0.19.0 tzdata : 2022.7

Comment From: NumanIjaz

pandas.read_csv takes quotechar as parameter (defaults to double quote "). You can change the behavior based on this parameter.

Python doesn't differentiate between single and double quotes, however, the data inside a csv file isn't python code which means we have to differentiate between single and double quotes there.

Comment From: MCRE-BE

hmmm indeed. The quotechar was probably switched somehow in my file causing the issue. I didn't think about checking the quotechar parameter. The file is generated by pandas, so I assumed the quotes where as generated. Probably the issue is there.

Thanks