Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame({"a": [1,1,2,2], "b":[None, None, None, None]})
# Both return differing answers
df.groupby('a').all()
df.groupby('a').any()
Issue Description
all
and any
return differing answers when groups only have null values. all
returns True for both keys where as any
returns False for both keys.
Expected Behavior
Unsure what the correct semantics are for this case. Since skipna
is default as True for both, both would be resolving empty sets of data. What's the expected behavior in such cases?
Installed Versions
Comment From: rhshadrach
Thanks for the report!
print(any([]), all([]))
# False True
any on an empty collection should be False, all on an empty collection should be True. This is part of Vacuous truth.
Comment From: rhshadrach
Just to maybe make it a bit more clear. Say you have a collection C:
all: For every x in C, x is truthy any: There exists an x in C such that x is truthy.
For an empty collection C, all evaluates to True vacuously. On the other hand for any, there is no x in C, let alone an x in C that is truthy. So any is False.
Comment From: pyrito
Wow, I didn't know this, but I guess it makes sense. Thanks @rhshadrach !
Comment From: pyrito
@rhshadrach that being said, I still see some similar behavior for when we have this case:
import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,1,2,2], "b":[np.nan, 31.3,np.nan,2.4]})
Shouldn't df.groupby('a').all()
return False here? I'm getting True. Is this just how the truthy value for np.nans are handled?
Comment From: rhshadrach
Remember skipna=True
by default. In addition, np.nan
is also truthy. You can determine whether an object obj
is truthy to Python by running bool(obj)
.
Comment From: pyrito
@rhshadrach sorry yes, I forgot to mention the skipna=False
in my example. And thanks for the pointer!
Comment From: pyrito
@rhshadrach are null values in pandas always np.nans?
Comment From: rhshadrach
https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.isna.html