Pandas Cannot specify internal dtype for categorical dtype when reading a CSV file

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> import numpy as np
>>> data = """a,b,c
1,a,3.4
1,a,3.4
2,b,4.5"""
>>> frame = pd.read_csv(StringIO(data), dtype={0: 'category', 1: 'str', 2: 'float64'})
>>> np.asarray(frame.ix[:,0])
array(['1', '1', '2'], dtype=object)
>>> np.asarray(frame.ix[:,1])
array(['a', 'a', 'b'], dtype=object)
>>> np.asarray(frame.ix[:,2])
array([ 3.4,  3.4,  4.5])

Problem description

When loading CSV data it seems it is not possible to specify what should be internal dtype for a categorical type. I can specify that it is categorical, but not that it is integer.

Expected Output

>>> np.asarray(frame.ix[:,0])
array([1, 1, 2])

But if I do:

>>> series = pd.Series([1, 1, 2], dtype='category')
>>> np.asarray(series)
array([1, 1, 2])

It would be great if I could at CSV reading time specify both that the column should be categorical and int.

(Using categorical and int is just for demo purposes.)

Or, on the other hand, is it guaranteed that dtype will be always object when read from CSV file and converting to numpy?

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.9.27-moby machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.20.1 pytest: None pip: 9.0.1 setuptools: 20.7.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: chris-b1

You are correct - specifying the category dtype when parsing is not currently supported, it is guaranteed that the categories are object. See docs here: http://pandas.pydata.org/pandas-docs/stable/io.html#specifying-categorical-dtype

You can convert the categories after parsing as in the doc example:

frame['a'].cat.categories = pd.to_numeric(frame['a'].cat.categories)

Comment From: mitar

Thanks for the reply. It makes sense.

Pandas Cannot specify internal dtype for categorical dtype when reading a CSV file

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`