I am experiencing an issue with read_csv When setting index_col, and a converters dict, output of engine='c' is correct and output of engine='python' is wrong
def first(s):
return s[0]
io = StringIO('col1,col2\n1_,a\n2_,b\n')
print pd.read_csv(io, index_col=0, converters = {0 : first})
col2
col1
1 a
2 b
io = StringIO('col1,col2\n1_,a\n2_,b\n')
print pd.read_csv(io, index_col=0, converters = {0 : first}, engine='python')
col2
col1
1_ a
2_ b
Expected output: for second example to be like the first Note: If I leave out index_col=0, converter works as expected
Output of pd.show_versions()
:
commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.11-23.53.amzn1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.1 nose: None pip: 8.1.2 setuptools: 12.2 Cython: None numpy: 1.11.1
Comment From: jreback
this is detailed in #9435, last example by @gfyoung; the logic for handling convererts is a bit convoluted ATM. pull-requests to fix would help.