Code Sample, a copy-pastable example if possible
import seaborn as sns
tips = sns.load_dataset("tips")
print(tips["day"].dtype)
counts = tips.groupby("size").day.value_counts().rename("count").reset_index()
print(counts["day"].dtype)
category
object
Problem description
When the categorical variable (day
) becomes an index in the groupby and then gets reset, it loses its categorical metadata (and dtype). In the case of the tips dataset, this includes the proper ordering of days.
Interestingly, this isn't the case with the Series.value_counts
method, which properly creates a CategoricalIndex
with the order metadata.
I don't see an existing issue about this but maybe https://github.com/pandas-dev/pandas/issues/9748 is related?
Expected Output
category
category
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.1
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.1
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Comment From: jreback
thanksj duplicate of https://github.com/pandas-dev/pandas/issues/18502