Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
from pandas.testing import assert_series_equal
# non-empty
df = pd.DataFrame([1], columns=["col1"])
res1 = df.apply(lambda r: r.col1, axis=1) # series
res2 = df.apply(lambda r: int(r.col1), axis=1) # series
assert_series_equal(res1, res2)
# -> OK!
# empty
df_empty = pd.DataFrame(columns=["col1"])
res3 = df_empty.apply(lambda r: r.col1, axis=1) # empty series
res4 = df_empty.apply(lambda r: int(r.col1), axis=1) # empty dataframe
assert_series_equal(res3, res4)
# -> AssertionError: Series Expected type <class 'pandas.core.series.Series'>,
# found <class 'pandas.core.frame.DataFrame'> instead
# Some use-case where you need to create a new series
pd.Series(res3) # Works as expected
pd.Series(res4) # Raises
# -> ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Issue Description
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../python3.9/site-packages/pandas/core/series.py", line 386, in __init__
if is_empty_data(data) and dtype is None:
File "/../python3.9/site-packages/pandas/core/construction.py", line 877, in is_empty_data
is_simple_empty = is_list_like_without_dtype and not data
File "/.../python3.9/site-packages/pandas/core/generic.py", line 1527, in __nonzero__
raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Expected Behavior
I would expect the result to be consistent, i.e. always return a series.
Installed Versions
Comment From: rhshadrach
Thanks for the report; this particular code path attempts to discern what shape to return by calling the passed function (your lambda, in this case) and using the result. Here, int(r.col1)
fails when the DataFrame is empty. With no information about what the function does to the input, we have no way of determining what shape (e.g. Series vs DataFrame) to return. In such a scenario we return the input.
I think this is a duplicate of #47959.
Comment From: adriantre
I see, that makes sense. The easiest fix will then be to always check for empty frames, and specify a desired return value/type with an early return. I find that we do this a lot, but that is maybe too much to ask of the general user.
Could this be a kwarg to the apply function? df.apply(my_func, axis=1, result_when_empty=pd.Series(...))
The result_when_empty
should allow for templating columns when it is a pd.DataFrame
.
Comment From: rhshadrach
I added some thoughts on this in https://github.com/pandas-dev/pandas/issues/47959#issuecomment-1382745734; closing here to keep the discussion consolidated. Please join the discussion there!