The code:
import pandas
df = pandas.read_csv(u"C:/成功例Q309~Metadata.tsv")
does not work, and gives the output:
IOError: File C:/???Q309.ppt~Metadata.tsv does not exist
It seems similar in nature to this issue: https://github.com/pydata/pandas/issues/9315 however #9315 was reportedly fixed in 14.2 with 3.3.5. I am using 15.1 and 2.7.7.
Here is the output of pd.show_versions()
:
commit: None python: 2.7.7.final.0 python-bits: 64 OS: Windows OS-release: 8 machine: AMD64 processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: en_US
pandas: 0.15.2 nose: 1.3.3 Cython: 0.20.1 numpy: 1.9.1 scipy: 0.15.1 statsmodels: 0.6.1 IPython: 2.3.1 sphinx: 1.2.3 patsy: 0.3.0 dateutil: 2.2 pytz: 2014.9 bottleneck: None tables: 3.1.1 numexpr: 2.3.1 matplotlib: 1.4.1 openpyxl: 1.8.5 xlrd: 0.9.3 xlwt: 0.7.5 xlsxwriter: 0.5.5 lxml: 3.3.5 bs4: 4.3.2 html5lib: None httplib2: None apiclient: None rpy2: None sqlalchemy: 0.9.4 pymysql: None psycopg2: None
Thanks, Justin
Comment From: jmatejka
For what it's worth, the following is a workaround which seems to be doing the trick:
f = open(u"C:/成功例Q309~Metadata.tsv")
df = pd.read_csv(f)
f.close()
Comment From: jreback
see this issue here: https://github.com/pydata/pandas/issues/6770
This is already in 0.15.2 (e.g. it will decode with the system encoding). So I think you maybe need to set it.
Comment From: jmatejka
My mistake, I am using 0.15.2 (not 15.1).
But I'm still not clear, what are you suggesting that I "set"? The system encoding? This is something that I would need to do before loading the file?
Thanks, Justin
Comment From: jreback
I think the system encoding might be set to something odd
you can try setting to utf-8 and see if it works
Comment From: jmatejka
The filesystemencoding and defaultsystemencoding are 'mbcs' and 'cp1252' respectively:
sys.getfilesystemencoding()
Out[12]: 'mbcs'
sys.getdefaultencoding()
Out[13]: 'cp1252'
These options all fail in a similar way though:
df = pandas.read_csv(u"C:/成功例Q309~Metadata.tsv", encoding='utf-8')
df = pandas.read_csv(u"C:/成功例Q309~Metadata.tsv", encoding='mbcs')
df = pandas.read_csv(u"C:/成功例Q309~Metadata.tsv", encoding='cp1252')
Should I bet setting the encoding in a different way?
Comment From: jreback
these have to do with the encoding of the file itself not the filename try decoding that filename before passing
eg
the_filename.decode('utf-8') then pass the filename
Comment From: jmatejka
Using filename.decode('utf'8')
gives this error:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 3-5: character maps to <undefined>
Comment From: TomAugspurger
I think this is an issue with the filesystem / encoding. Let me know if it's still a problem, and if Python's builtin open(filename)
works, but pandas read_csv
does not.
Comment From: Masterxilo
This still happens for me. The worst part is that if I use the workaround using open
, read_csv does not parse the utf-8 in the file correctly anymore. Any help?
Comment From: Rajasivaranjan
Try using Open command as below. It worked for me. df = pd.read_csv(open(filename, 'r'))