Code Sample, a copy-pastable example if possible
In [3]: df1 = pd.DataFrame({c : np.random.random(4) for c in ['a', 'b']})
In [4]: date = pd.DataFrame({'day' : 15, 'month' : df1.index + 1, 'year' : 2009})
In [5]: df1['date'] = pd.to_datetime(date)
In [6]: df1['month'] = df1['date'].dt.month
In [7]: df2 = df1.groupby('month').max()
In [8]: df2.transpose().mean()
Out[8]:
Series([], dtype: float64)
Problem description
df2
had strictly positive dimensions, its mean cannot be an empty Series
.
Expected Output
The same as
In [9]: df2.mean(axis=1)
Out[9]:
month
1 0.649630
2 0.392446
3 0.693638
12 0.381830
dtype: float64
Output of pd.show_versions()
Comment From: jreback
this is as expected, you have a an all object array
In [5]: df2.transpose()
Out[5]:
month 1 2 3 4
a 0.392964 0.0827719 0.129667 0.333425
b 0.661218 0.342849 0.503337 0.409731
date 2009-01-15 00:00:00 2009-02-15 00:00:00 2009-03-15 00:00:00 2009-04-15 00:00:00
In [7]: df2.transpose().mean(numeric_only=False)
TypeError: unsupported operand type(s) for +: 'Timestamp' and 'float'
Comment From: jreback
this by default skips the non-numeric values
In [9]: df2.mean(axis=1)
Out[9]:
month
1 0.527091
2 0.212811
3 0.316502
4 0.371578
dtype: float64
In [12]: df2.select_dtypes(include=['float']).T.mean()
Out[12]:
month
1 0.527091
2 0.212811
3 0.316502
4 0.371578
dtype: float64
Comment From: toobaz
Aha! Right, sorry
Comment From: toobaz
While I'm at it... is the following also desired?
In [3]: df = pd.DataFrame([[1,2,3],[4,5,6]])
In [4]: list(df.groupby(1).mean())
Out[4]: [0, 2]
In [5]: list(df.groupby(1).apply(np.mean))
Out[5]: [0, 1, 2]
that is, native groupby
operations differ from user-provided methods in the fact that they miss the groupby column
Comment From: jorisvandenbossche
@toobaz I think that is how apply
works (also applying on the key column). You can use .agg
if you want the "aggregation behaviour" of not applying on the key column:
In [18]: df.groupby(1).agg(lambda x: np.mean(x))
Out[18]:
0 2
1
2 1 3
5 4 6
(only using the lambda to show that this works on a user defined function, agg(np.mean)
would take a fast path the same as groupby().mean()
)
But, we should have a better specification of those aspects of groupby.
Comment From: toobaz
@jorisvandenbossche : right, makes a lot of sense! And I now see the following note I had missed: "apply can act as a reducer, transformer, or filter function, depending on exactly what is passed to it. So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in the output as well as set the indices."