Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame({"a": [1,1,2,2], "b":[None, None, None, None]})

# Both return differing answers
df.groupby('a').all()
df.groupby('a').any()

Issue Description

all and any return differing answers when groups only have null values. all returns True for both keys where as any returns False for both keys.

Expected Behavior

Unsure what the correct semantics are for this case. Since skipna is default as True for both, both would be resolving empty sets of data. What's the expected behavior in such cases?

Installed Versions

INSTALLED VERSIONS ------------------ commit : 8dab54d6573f7186ff0c3b6364d5e4dd635ff3e7 python : 3.11.0.final.0 python-bits : 64 OS : Darwin OS-release : 21.3.0 Version : Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_ARM64_T6000 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.2 numpy : 1.24.1 pytz : 2022.7 dateutil : 2.8.2 setuptools : 65.6.3 pip : 22.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.8.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : 2022.11.0 gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 10.0.1 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: rhshadrach

Thanks for the report!

print(any([]), all([]))
# False True

any on an empty collection should be False, all on an empty collection should be True. This is part of Vacuous truth.

Comment From: rhshadrach

Just to maybe make it a bit more clear. Say you have a collection C:

all: For every x in C, x is truthy any: There exists an x in C such that x is truthy.

For an empty collection C, all evaluates to True vacuously. On the other hand for any, there is no x in C, let alone an x in C that is truthy. So any is False.

Comment From: pyrito

Wow, I didn't know this, but I guess it makes sense. Thanks @rhshadrach !

Comment From: pyrito

@rhshadrach that being said, I still see some similar behavior for when we have this case:

import pandas as pd
import numpy as np

df = pd.DataFrame({"a": [1,1,2,2], "b":[np.nan, 31.3,np.nan,2.4]})

Shouldn't df.groupby('a').all() return False here? I'm getting True. Is this just how the truthy value for np.nans are handled?

Comment From: rhshadrach

Remember skipna=True by default. In addition, np.nan is also truthy. You can determine whether an object obj is truthy to Python by running bool(obj).

Comment From: pyrito

@rhshadrach sorry yes, I forgot to mention the skipna=False in my example. And thanks for the pointer!

Comment From: pyrito

@rhshadrach are null values in pandas always np.nans?

Comment From: rhshadrach

https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.isna.html