• [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
pd.Series([pd.NaT, ''], dtype='category').isin({pd.NaT, ''})  # [True, True]
pd.Series([''], dtype='category').isin({''})  # [True] because '' is in series
pd.Series([''], dtype='category').isin({None, ''})  # [True] because '' is in series
pd.Series([''], dtype='category').isin({pd.NaT, ''})  # [False] even though '' is in series

Problem description

When searching for empty values in a categorical full of empty values, searching for pd.NaT will prevent other terms from being found.

(Perhaps that's because Pandas casts all input values to Timedelta?)

This behavior is unexpected. It makes me feel like I must be crazy. If I'm not meant to search for pd.NaT, I'd feel very happy with an error message when I try this.

Expected Output

pd.Series([''], dtype='category').isin({pd.NaT, ''})  # [True] because '' is in series

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : 2a7d3326dee660824a8433ffd01065f8ac37f7d6 python : 3.8.5.final.0 python-bits : 64 OS : Linux OS-release : 5.8.4-200.fc32.x86_64 Version : #1 SMP Wed Aug 26 22:28:08 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.1.2 numpy : 1.19.2 pytz : 2019.3 dateutil : 2.8.0 pip : 19.3.1 setuptools : 45.2.0 Cython : None pytest : 5.4.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : None pandas_datareader: None bs4 : 4.9.1 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Comment From: dsaxton

@adamhooper Thanks, seems there is a bug here. The value doesn't have to be pd.NaT, it can be other datetime-like things, so maybe your suspicion is right that the empty string is being converted somewhere (other values don't seem to have the same effect):

[ins] In [1]: import numpy as np
         ...: import pandas as pd
         ...:
         ...:
         ...: value = ""
         ...:
         ...: s = pd.Series([value], dtype="category")
         ...: print(s.isin([pd.Timedelta(0), value]))
         ...: print(pd.__version__)
         ...:
0    False
dtype: bool
1.2.0.dev0+469.gf04624104

Comment From: samilAyoub

take

Comment From: topper-123

This works as expected in main (v2.0rc1):

>>> import pandas as pd
>>>
>>> pd.Series([pd.NaT, ''], dtype='category').isin({pd.NaT, ''})
0    True
1    True
dtype: bool
>>> pd.Series([''], dtype='category').isin({''})
0    True
dtype: bool
>>> pd.Series([''], dtype='category').isin({None, ''})
0    True
dtype: bool
>>> pd.Series([''], dtype='category').isin({pd.NaT, ''})  # this gave [False] before
0    True
dtype: bool

I'll add a test for this so it doesn't pop up again.