Pandas Pandas groupby sum misbehaves when one of the columns has string objects.

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'A': 'a b c'.split(), 'B': [1,2,3], 'C': [4,6, 5]})
df['B'] = np.nan
df.groupby(lambda x:x, axis=1).sum(min_count=1)

Problem description

The output for the above code should return NaN for the column 'B' of the DataFrame when min_count=1 as mentioned in DataFrame.sum doc., but it is returning 0 instead.

Actual Output

    A   B   C
0   a   0.0 4
1   b   0.0 6
2   c   0.0 5

Expected Ouput

    A   B    C
0   a   NaN  4
1   b   NaN  6
2   c   NaN  5

So, I decided to remove the column with strings, then it returns NaN for the column 'B'. However, I think the columns in the Dataframe should be independent to each other. It seems like a bug.

del df['A']
df.groupby(lambda x:x, axis=1).sum(min_count=1)

Output

    B   C
0   NaN 4.0
1   NaN 6.0
2   NaN 5.0

Output of `pd.show_versions()`

pandas: 0.23.4 pytest: 4.0.1 pip: 8.1.1 setuptools: 40.6.2 Cython: 0.29.1 numpy: 1.15.4 scipy: 1.1.0 pyarrow: 0.11.1 xarray: 0.11.0 IPython: 7.2.0 sphinx: 1.8.2 patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.7 blosc: 1.6.2 bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.8 feather: None matplotlib: 3.0.2 openpyxl: 2.5.12 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.1.2 lxml: 4.2.5 bs4: 4.6.3 html5lib: 0.999 sqlalchemy: 1.2.14 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: 0.2.0 pandas_gbq: None pandas_datareader: None

Comment From: WillAyd

Hmm OK that is strange. Definitely some unwanted casting going on with the presence of column A.

Investigation and PRs are always welcome if you care to take a look

Comment From: Koustav-Samaddar

This is the most stripped down case where I am still able to reproduce the bug.

# bug_var = 1
bug_var = 'a'

df = pd.DataFrame({
    'A': [bug_var, np.nan]
})

gresult = df.groupby(lambda x: x)
result = gresult.sum(min_count=1)

This yields,

result
   A
0  a
1  0

but,

expected
     A
0    a
1  NaN

Due to the non-numeric type, self._cython_agg_general at groupby.py:1256 raises an Error (DataError if numeric_only=True otherwise AttributeError) which causes the program to rely on self.aggregate (groupby.py:1262) which is where the bug stems from.

I'll continue to take a look at this tomorrow, but in the meantime I'm putting this here if anyone finds this useful.

Comment From: ikramersh

This issue no longer occurs when tested with pandas version 1.3.4

Comment From: gmaiwald

take