Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
print(f"{pd.isnull([1, 2, 3]) = }")
print(f"{pd.isnull((1, 2, 3)) = }")

Issue Description

It looks like pd.isnull is treating list as array-like and tuple as a scalar

Expected Behavior

I'd expect lists and tuples to be treated similarly by pd.isnull. Similar to other parts of the API like pd.Series

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 2e218d10984e9919f0296931d92ea851c6a6faf5
python           : 3.9.15.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 22.3.0
Version          : Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.5.3
numpy            : 1.24.0
pytz             : 2022.6
dateutil         : 2.8.2
setuptools       : 59.8.0
pip              : 22.3.1
Cython           : None
pytest           : 7.2.0
hypothesis       : None
sphinx           : 4.5.0
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 3.1.2
IPython          : 8.7.0
pandas_datareader: None
bs4              : 4.11.1
bottleneck       : None
brotli           :
fastparquet      : 2023.2.0
fsspec           : 2022.11.0
gcsfs            : None
matplotlib       : 3.6.2
numba            : None
numexpr          : 2.8.3
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 11.0.0
pyreadstat       : None
pyxlsb           : None
s3fs             : 2022.11.0
scipy            : 1.9.3
snappy           :
sqlalchemy       : 1.4.46
tables           : 3.7.0
tabulate         : None
xarray           : 2022.9.0
xlrd             : None
xlwt             : None
zstandard        : None
tzdata           : None

Comment From: DeaMariaLeon

This function returns a boolean or array-like of bool. I'll keep the "bug" label just in case, but I don't think it is one.

https://pandas.pydata.org/docs/reference/api/pandas.isnull.html

Comment From: jrbourbeau

Thanks @DeaMariaLeon. The thing that seems off to me is pd.isnull is treating lists as array-like (returning a array-like of bools) and tuples as scalar (returning a bool)

In [1]: import pandas as pd

In [2]: pd.isnull([1, pd.NA, 3])
Out[2]: array([False,  True, False])

In [3]: pd.isnull((1, pd.NA, 3))
Out[3]: False

My expectation is that both lists and tuples should be treated as array-like. Though feel free to let me know if that expectation is incorrect

Comment From: DeaMariaLeon

Oh, I see! Thank you for opening an issue. :)

Comment From: phofl

This is an edge case I think.

You can end up with tuples from a MultiIndex for example. In this scenario we want to treat the tuple as a single element, e.g.

df.drop(columns=(1, 2))

treats the tuple as a single element. I think this is similar here although it does not really look intuitive to me either.

Comment From: rhshadrach

Interestingly in a list, tuples are treated as array-like:

obj = [(1.0, 2.0), (1.0, np.nan), (np.nan, 2.0), (np.nan, np.nan)]
print(pd.isnull(obj))
# [[False False]
#  [False  True]
#  [ True False]
#  [ True  True]]

Comment From: jorisvandenbossche

treating lists as array-like (returning a array-like of bools) and tuples as scalar (returning a bool)

As far as I remember, in the past we made this distinction (in certain places) because tuples can be labels, as Patrick mentioned.

But it's indeed a tricky situation, with easy confusion and corner cases (a quick search for "tuple list label" gives quite some related issues). For example https://github.com/pandas-dev/pandas/issues/43978 for the drop example.

Another example in indexing where the two are distinguished and have different behaviour:

>>> s = pd.Series(range(6), index=pd.MultiIndex.from_product([[1, 2, 3], [1, 2]]))
>>> s.loc[(1, 2)]  # tuple is a single label
1
>>> s.loc[[1, 2]]  # list is an indexer (in this case for the first level of the MultiIndex)
1  1    0
   2    1
2  1    2
   2    3
dtype: int64