Code:
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
>>> pd.__version__
'0.20.1'
>>> import platform
>>> platform.platform()
'Windows-7-6.1.7601-SP1'
>>> import pandas as pd
>>> df = pd.read_csv(r'c:\tmp\中文.csv')
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-6-0cd6317422e5>", line 1, in <module>
df = pd.read_csv(r'c:\tmp\中文.csv')
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 405, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 762, in __init__
self._make_engine(self.engine)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 966, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1582, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 394, in pandas._libs.parsers.TextReader.__cinit__ (pandas\_libs\parsers.c:4209)
File "pandas\_libs\parsers.pyx", line 712, in pandas._libs.parsers.TextReader._setup_parser_source (pandas\_libs\parsers.c:8895)
OSError: Initializing from file failed
Problem description
python 3.6 changed sys.getfilesystemencoding() to return "utf-8" instead of "mbcs" see PEP 529.
How to fix
Here is the problem: parsers.pyx
if isinstance(source, basestring):
if not isinstance(source, bytes):
source = source.encode(sys.getfilesystemencoding() or 'utf-8')
the source parameter is our filename, and will be encoded to 'utf-8', not legacy 'mbcs' in python 3.6 and finally passed to open() in io.c:new_file_source thus interpreted as a mbcs string, so, the "File not found" exception is not suprised maybe this should be the responsiblity of cython for python 3.6 to handle these things by using unicode version of windows API, but for now, we just replace sys.getfilesystemencoding() to "mbcs"
Comment From: mfmain
there is a workaround with speed compromised: python df = pd.read_csv(r'c:\tmp\中文.csv', engine='python')
but it is a dirty work to modify every single call to read_csv in all your projects
Comment From: jreback
this is a duplicate of this: https://github.com/pandas-dev/pandas/issues/15086
there is a PR attached but unfortunately it was blown away.
certainly would take a fix for this.
Comment From: yuquant
文件名不要用中文名,要改成英文。 Do NOT use the Chinese in the file name,change it to English.