Pandas performance issue with aggregation last/first of resampler when df contains categoricals

Code Sample, a copy-pastable example if possible

import time
df_time = pd.DataFrame(np.random.randint(low=0, high=20, size=(1000, 4)), pd.date_range(start='2000-01-01', end='2000-01-02', periods=1000), columns=['a', 'b', 'c', 'd'])

time0 = time.time()
df_time.resample('1s').last()
print('resampling the datetime64[ns] data frame took {} seconds'.format(time.time()-time0))

df_time.loc[:,'a'] = df_time['a'].astype('category')

time0 = time.time()
df_time.resample('1s').last()
print('resampling the datetime64[ns] data frame took {} seconds'.format(time.time()-time0))

resampling the df containing integers took 0.007719516754150391 seconds resampling the df containing categoricals took 38.36969518661499 seconds

Problem description

As code above shows resampling df containing categoricals is super slow for aggregation .last(). Same behaviour occurs for .first(), whereas .mean() is performing fine.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-1052-aws machine : x86_64 processor : byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 0.25.3 numpy : 1.17.4 pytz : 2019.3 dateutil : 2.8.0 pip : 19.3.1 setuptools : 41.6.0 Cython : 0.29.14 pytest : 5.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : 0.9.3 psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : 2.10.3 IPython : 7.9.0 pandas_datareader: None bs4 : 4.8.1 bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.1 numexpr : 2.7.0 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None s3fs : 0.4.0 scipy : 1.3.2 sqlalchemy : 1.3.11 tables : 3.6.1 xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None

Comment From: jbrockmendel

Fixed by #52120, closing.

Pandas performance issue with aggregation last/first of resampler when df contains categoricals

Code Sample, a copy-pastable example if possible

Problem description

Output of pd.show_versions()

Output of `pd.show_versions()`