I'm working on a little pandas lesson. The CSV has a column with leading '0' which should be treated as text. When I try to read the CSV and indicate that that column is the index
, the dtype
argument doesn't work. I can use S7
as the dtype
instead of object
but @wesm suggests using 'object'
Code Sample, a copy-pastable example if possible
from StringIO import StringIO import pandas as pd data = """ fips,name,popest2014 0800760,"Aguilar town, Colorado",479 0800925,"Akron town, Colorado",1694 0801090,"Alamosa city, Colorado",9531 """ error_case = pd.read_csv(StringIO(data), index_col=0, dtype={'fips': object}) print "Error case, dtype should be object " print error_case.index.dtype # 'int64' expected1 = pd.read_csv(StringIO(data), dtype={'fips': object}) print "Works with object, not an index column" print expected1.fips.dtype # 'O' expected2 = pd.read_csv(StringIO(data), index_col=0, dtype={'fips': 'S7'}) print "Works with 'S7', as an index column" print expected2.index.dtype # 'O'
Expected Output
That the dtype for the FIPS column as an index would be object
output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.9.final.0 python-bits: 64 OS: Darwin OS-release: 15.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.17.1 nose: None pip: 8.0.3 setuptools: 20.1.1 Cython: None numpy: 1.10.4 scipy: None statsmodels: None IPython: 4.1.1 sphinx: None patsy: None dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None Jinja2: None
Comment From: jreback
dupe of #9435