code: pandas.read_csv(io.StringIO(data), engine="python")
if 'data' contains a non-ascii character this code will throw an exception
stack trace: File "/home/jake/Desktop/WiderNSF/physics/service/ExcelTable.py", line 63, in init frames = {"default":pandas.read_csv(io.StringIO(data), engine="python")} File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 400, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 198, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 479, in init self._make_engine(self.engine) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 592, in _make_engine self._engine = klass(self.f, self.options) File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1240, in init self.columns = self._infer_columns() File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1406, in _infer_columns line = self._next_line() File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1473, in _next_line line = next(self.data) UnicodeEncodeError: 'ascii' codec can't encode character u'\u03bb' in position 20: ordinal not in range(128)
I have experienced this issue on multipule systems
Comment From: jreback
decode it first
pls alway show pd.show_versions()
, and a data sample when reporting about csv
Comment From: jreback
sure it does; I think your encoding is wrong.
Comment From: JakeEhrlich
but it isn't wrong with the C version? that doesn't make any since to me unless there is a double standard for the engines in which case that should be documented and probably fixed.
Comment From: JakeEhrlich
I suppose I didn't say. This all works perfectly with the C engine on my local machine. but as you may have seen in my other bug report the C engine hangs on the server so I have a bit of a dilemma.
Comment From: JakeEhrlich
and also 'data' is already decoded
Comment From: jreback
you have some weird data/encoding. no idea. If it works for you locally, then examine the difference with the server (version and usage and such).
Comment From: JakeEhrlich
same versions of pandas, same version of python, same version of django even (although that hardly matters here).
Comment From: jreback
then its clearly not a pandas issue. check numpy version too. and have no idea which version you are even using. pls alway show pd.show_versions()
Comment From: JakeEhrlich
How do you say that? It clearly is. I'm not changing anything but the engine and the behavior changes! This issue is occurring on both 0.12.0 and 0.14.0.dev
Comment From: jreback
w/o a sample file, this discussion is silly. how is one supposed to debug anything????
Comment From: JakeEhrlich
here is a sample file: http://pastebin.com/G0UTrnMK (same one from other thread)
Comment From: JakeEhrlich
but also I am getting this issue with ANY string that contains unicode characters.
Comment From: jreback
pls post pd.show_versions()
Comment From: JakeEhrlich
I only can get it for my server. It seems I need greater than 0.12.0 to have show_versions()
here is for my server:
commit: None python: 2.7.7.final.0 python-bits: 64 OS: Linux OS-release: 3.14-1-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.14.0.dev nose: 1.3.3 Cython: None numpy: 1.8.1 scipy: 0.13.3 statsmodels: 0.4.2 IPython: None sphinx: None patsy: None scikits.timeseries: None dateutil: 1.5 pytz: 2012c bottleneck: None tables: 3.1.1 numexpr: 2.2.2 matplotlib: 1.3.1 openpyxl: 1.7.0 xlrd: 0.9.2 xlwt: 0.7.5 xlsxwriter: None lxml: 3.3.5 bs4: 4.3.2 html5lib: 0.999 bq: None apiclient: None rpy2: None sqlalchemy: None pymysql: None psycopg2: None