Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
a = pd.DataFrame({'numbers': np.arange(10), 'categories': list('abcabcabcd')})
a['categories'] = a['categories'].astype('category')
b = pd.DataFrame({'numbers': np.arange(10)})

print a.dtypes
print a.categories.cat.categories
print

merged = pd.merge(a, b, left_index=True, right_index=True)

print merged.dtypes
print merged['categories'].cat.categories
print 'Merge ok!'
print

merged = pd.merge(a, b, on=['numbers'], how='left')
print merged.dtypes
try:
    print merged['categories'].cat.categories #crashes
except:
    print 'Merge not ok!'
print

Expected Output

The try block should print "categories" 's categories the same way as above, with: Index([u'a', u'b', u'c', u'd'], dtype='object')

However, the data type is replaced to object/string.

This is not fixed by the v0.18.2 release, which fixes some of the merge issues where int's would get casted to floats when merging.

output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.11.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-431.23.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.17.1 nose: 1.3.7 pip: 7.1.2 setuptools: 20.3.1 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.16.0 statsmodels: 0.6.1 IPython: 4.0.1 sphinx: 1.3.1 patsy: 0.4.0 dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.4.4 matplotlib: 1.5.0 openpyxl: 2.2.6 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.7.7 lxml: 3.4.4 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.9 pymysql: None psycopg2: None Jinja2: None

Comment From: laufere

@chrish42

Comment From: jreback

duplicate of #10409

which is still open (and 0.18.2 is not released as of yet)