Pandas python engine for read_csv does not support unicode

code: pandas.read_csv(io.StringIO(data), engine="python")

if 'data' contains a non-ascii character this code will throw an exception

stack trace: File "/home/jake/Desktop/WiderNSF/physics/service/ExcelTable.py", line 63, in init frames = {"default":pandas.read_csv(io.StringIO(data), engine="python")} File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 400, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 198, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 479, in init self._make_engine(self.engine) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 592, in _make_engine self._engine = klass(self.f, self.options) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1240, in init self.columns = self._infer_columns() File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1406, in _infer_columns line = self._next_line() File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1473, in _next_line line = next(self.data) UnicodeEncodeError: 'ascii' codec can't encode character u'\u03bb' in position 20: ordinal not in range(128)

I have experienced this issue on multipule systems

Comment From: jreback

decode it first

pls alway show pd.show_versions(), and a data sample when reporting about csv

Comment From: jreback

sure it does; I think your encoding is wrong.

Comment From: JakeEhrlich

but it isn't wrong with the C version? that doesn't make any since to me unless there is a double standard for the engines in which case that should be documented and probably fixed.

Comment From: JakeEhrlich

I suppose I didn't say. This all works perfectly with the C engine on my local machine. but as you may have seen in my other bug report the C engine hangs on the server so I have a bit of a dilemma.

Comment From: JakeEhrlich

and also 'data' is already decoded

Comment From: jreback

you have some weird data/encoding. no idea. If it works for you locally, then examine the difference with the server (version and usage and such).

Comment From: JakeEhrlich

same versions of pandas, same version of python, same version of django even (although that hardly matters here).

Comment From: jreback

then its clearly not a pandas issue. check numpy version too. and have no idea which version you are even using. pls alway show pd.show_versions()

Comment From: JakeEhrlich

How do you say that? It clearly is. I'm not changing anything but the engine and the behavior changes! This issue is occurring on both 0.12.0 and 0.14.0.dev

Comment From: jreback

w/o a sample file, this discussion is silly. how is one supposed to debug anything????

Comment From: JakeEhrlich

here is a sample file: http://pastebin.com/G0UTrnMK (same one from other thread)

Comment From: JakeEhrlich

but also I am getting this issue with ANY string that contains unicode characters.

Comment From: jreback

pls post pd.show_versions()

Comment From: JakeEhrlich

I only can get it for my server. It seems I need greater than 0.12.0 to have show_versions()

here is for my server:

commit: None python: 2.7.7.final.0 python-bits: 64 OS: Linux OS-release: 3.14-1-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.14.0.dev nose: 1.3.3 Cython: None numpy: 1.8.1 scipy: 0.13.3 statsmodels: 0.4.2 IPython: None sphinx: None patsy: None scikits.timeseries: None dateutil: 1.5 pytz: 2012c bottleneck: None tables: 3.1.1 numexpr: 2.2.2 matplotlib: 1.3.1 openpyxl: 1.7.0 xlrd: 0.9.2 xlwt: 0.7.5 xlsxwriter: None lxml: 3.3.5 bs4: 4.3.2 html5lib: 0.999 bq: None apiclient: None rpy2: None sqlalchemy: None pymysql: None psycopg2: None