-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
pd.Series([pd.NaT, ''], dtype='category').isin({pd.NaT, ''}) # [True, True]
pd.Series([''], dtype='category').isin({''}) # [True] because '' is in series
pd.Series([''], dtype='category').isin({None, ''}) # [True] because '' is in series
pd.Series([''], dtype='category').isin({pd.NaT, ''}) # [False] even though '' is in series
Problem description
When searching for empty values in a categorical full of empty values, searching for pd.NaT
will prevent other terms from being found.
(Perhaps that's because Pandas casts all input values to Timedelta?)
This behavior is unexpected. It makes me feel like I must be crazy. If I'm not meant to search for pd.NaT
, I'd feel very happy with an error message when I try this.
Expected Output
pd.Series([''], dtype='category').isin({pd.NaT, ''}) # [True] because '' is in series
Output of pd.show_versions()
Comment From: dsaxton
@adamhooper Thanks, seems there is a bug here. The value doesn't have to be pd.NaT, it can be other datetime-like things, so maybe your suspicion is right that the empty string is being converted somewhere (other values don't seem to have the same effect):
[ins] In [1]: import numpy as np
...: import pandas as pd
...:
...:
...: value = ""
...:
...: s = pd.Series([value], dtype="category")
...: print(s.isin([pd.Timedelta(0), value]))
...: print(pd.__version__)
...:
0 False
dtype: bool
1.2.0.dev0+469.gf04624104
Comment From: samilAyoub
take
Comment From: topper-123
This works as expected in main (v2.0rc1):
>>> import pandas as pd
>>>
>>> pd.Series([pd.NaT, ''], dtype='category').isin({pd.NaT, ''})
0 True
1 True
dtype: bool
>>> pd.Series([''], dtype='category').isin({''})
0 True
dtype: bool
>>> pd.Series([''], dtype='category').isin({None, ''})
0 True
dtype: bool
>>> pd.Series([''], dtype='category').isin({pd.NaT, ''}) # this gave [False] before
0 True
dtype: bool
I'll add a test for this so it doesn't pop up again.