• [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

def calcscore(df):
    return pd.Series([[','.join(df.item), df.score.sum()]], index=pd.Index([df.index[0]]))

df = pd.DataFrame(columns=['time', 'item', 'param', 'score'],
        data=[[1, 'N1', 10, 100],
              [1, 'N2', 20, 150],
              [2, 'N3', 70, 300]])

for p in [5, 50]:
    s = df[df.param > p].groupby('time', group_keys=False).apply(calcscore)
    print(type(s))
    print(s.to_list())

Problem description

I'm looping through a dataframe, selectively filtering by varying values for a param column and returning aggregated data based on that filtering (the actual code is more complex). I'm intentionally returning a Series object from the apply function since I've benchmarked it to be faster than returning a DataFrame. The problem is that sometimes, a DataFrame is unexpectedly returned by the apply function instead of the Series object that I explicitly returned. This seems to happen when the Series object has only one row. Here is the unexpected output that I'm seeing:

<class 'pandas.core.series.Series'>
[['N1,N2', 250], ['N3', 300]]
<class 'pandas.core.frame.DataFrame'>
Traceback (most recent call last):

  File "/tmp/applytest.py", line 22, in <module>
    print(s.to_list())

  File "/usr/lib64/python3.8/site-packages/pandas/core/generic.py", line 5130, in __getattr__
    return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'to_list'

Expected Output

<class 'pandas.core.series.Series'>
[['N1,N2', 250], ['N3', 300]]
<class 'pandas.core.series.Series'>
[['N3', 300]]

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : d9fff2792bf16178d4e450fe7384244e50635733 python : 3.8.5.final.0 python-bits : 64 OS : Linux OS-release : 5.7.11-200.fc32.x86_64 Version : #1 SMP Wed Jul 29 17:15:52 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.1.0 numpy : 1.19.1 pytz : 2020.1 dateutil : 2.8.0 pip : 19.3.1 setuptools : 41.6.0 Cython : 0.29.14 pytest : None hypothesis : None sphinx : 3.1.2 blosc : None feather : None xlsxwriter : None lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.12.0 pandas_datareader: 0.9.0 bs4 : 4.9.1 bottleneck : None fsspec : None fastparquet : 0.4.1 gcsfs : None matplotlib : 3.3.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : 1.5.2 sqlalchemy : None tables : None tabulate : 0.8.7 xarray : None xlrd : None xlwt : None numba : 0.50.1

Comment From: andrewkoo

I took a look at the source and found that the error may be related to the call to unstack in _wrap_applied_output (pandas>core>groupby>generic.py>DataFrameGroupBy>wrap_applied_output). concat does indeed return a Series but unstack is converting the series to a DataFrame.

Removing the call to unstack seems to result in the expected output for this case, but I'm not sure if that would break other cases...

def _wrap_applied_output(self, keys, values, not_indexed_same=False):
            ...
                if self.axis == 0 and isinstance(v, ABCSeries):
                    if (
                        isinstance(v.index, MultiIndex)
                        or key_index is None
                        or isinstance(key_index, MultiIndex)
                    ):
                        ...
                    else:
                        # GH5788 instead of stacking; concat gets the
                        # dtypes correct
                        from pandas.core.reshape.concat import concat
                        result = concat(
                            values,
                            keys=key_index,
                            names=key_index.names,
                            axis=self.axis,
                        ).unstack()
                        result.columns = index
            ...

Not quite sure how we want to fix the bug (additional checks, etc.), but this is my first time contributing so would love to help any way I can!

Comment From: rhshadrach

I believe this is a duplicate of #31063.

The issue is that apply attempts to transform the result differently in the cases where all the indices are the same or some are different. This leads to different behavior when there is a single row vs multiple rows. Certainly this is something that needs to be improved on - PRs always welcome!

Comment From: rhshadrach

Closing as a duplicate