Code Sample, a copy-pastable example if possible
import pandas
col_a = ["Zenchef (formerly 1001menus) is the , éé eas"]
frame = pandas.DataFrame({'a':pandas.Series(col_a, index=['s1'])})
frame.to_csv("test.csv", index_label="ICOL")
frame2 = pandas.read_csv("test.csv")
print(frame2)
Problem description
Pandas is not able to read its own CSV file. That can't be good
Traceback (most recent call last): File "pandas_libs\parsers.pyx", line 1162, in pandas._libs.parsers.TextReader._convert_tokens (pandas_libs\parsers.c:14858) File "pandas_libs\parsers.pyx", line 1273, in pandas._libs.parsers.TextReader._convert_with_dtype (pandas_libs\parsers.c:17119) File "pandas_libs\parsers.pyx", line 1289, in pandas._libs.parsers.TextReader._string_convert (pandas_libs\parsers.c:17347) File "pandas_libs\parsers.pyx", line 1524, in pandas._libs.parsers._string_box_utf8 (pandas_libs\parsers.c:23041) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 38: invalid continuation byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:\data\progetti_miei\python\pandas_csv_bug\main.py", line 8, in
frame2 = pandas.read_csv("test.csv") File "C:\IntelPython35\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f return _read(filepath_or_buffer, kwds) File "C:\IntelPython35\lib\site-packages\pandas\io\parsers.py", line 411, in _read data = parser.read(nrows) File "C:\IntelPython35\lib\site-packages\pandas\io\parsers.py", line 982, in read ret = self._engine.read(nrows) File "C:\IntelPython35\lib\site-packages\pandas\io\parsers.py", line 1719, in read data = self._reader.read(nrows) File "pandas_libs\parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas_libs\parsers.c:10862) File "pandas_libs\parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas_libs\parsers.c:11138) File "pandas_libs\parsers.pyx", line 989, in pandas._libs.parsers.TextReader._read_rows (pandas_libs\parsers.c:12175) File "pandas_libs\parsers.pyx", line 1117, in pandas._libs.parsers.TextReader._convert_column_data (pandas_libs\parsers.c:14136) File "pandas_libs\parsers.pyx", line 1169, in pandas._libs.parsers.TextReader._convert_tokens (pandas_libs\parsers.c:14972) File "pandas_libs\parsers.pyx", line 1273, in pandas._libs.parsers.TextReader._convert_with_dtype (pandas_libs\parsers.c:17119) File "pandas_libs\parsers.pyx", line 1289, in pandas._libs.parsers.TextReader._string_convert (pandas_libs\parsers.c:17347) File "pandas_libs\parsers.pyx", line 1524, in pandas._libs.parsers._string_box_utf8 (pandas_libs\parsers.c:23041) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 38: invalid continuation byte
Expected Output
Output of pd.show_versions()
APPENDIX: Temporary fix
frame.to_csv("test.csv", index_label="ICOL", encoding="utf-8")
Comment From: jreback
In [1]: import pandas
...:
...: col_a = ["Zenchef (formerly 1001menus) is the , éé eas"]
...:
...: frame = pandas.DataFrame({'a':pandas.Series(col_a, index=['s1'])})
...: frame.to_csv("test.csv", index_label="ICOL")
...:
...: frame2 = pandas.read_csv("test.csv")
...: print(frame2)
...:
ICOL a
0 s1 Zenchef (formerly 1001menus) is the , éé eas
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.35-pv-ts1
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
you might need to set your locale (its none), see https://stackoverflow.com/questions/31469707/changing-the-locale-preferred-encoding-in-python-3-in-windows?rq=1
Comment From: fanguoguo
File "pandas/_libs/parsers.pyx", line 384, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source