Currently the use of Series.apply / Series.agg and DataFrame.apply / DataFrame.agg is confusing. In particular, sometimes the user calls apply and gets the results of agg or vice-versa:
applywith list- or dict-like arguments callsagg.DataFrame.aggwith a UDF callsDataFrame.apply.Series.aggwith a UDF callsSeries.apply, and if this fails, attempts to pass the Series to the UDF.
If we are to change the current behavior, it will need to go through deprecation. This will be a bit tricky with the way the code paths switch between agg and apply, but I believe it can be done (see https://github.com/pandas-dev/pandas/pull/49672#issuecomment-1312775348).
In order to clarify the difference between agg and apply for users, I propose the following for a single argument:
- (unchanged)
DataFrame.applywill apply the function to each Series, the result shape will be inferred from the output. - (unchanged)
DataFrame.applymapwill apply the function to each cell. - (unchanged)
Series.applywill apply the function to each row. - (changed)
DataFrame.aggwill act on each Series that makes up the DataFrame, the result will always be a Series. Currently the result shape is inferred from the output. - (changed)
Series.aggwill act on the Series, and the result will be whatever the return is. Currentlyapplyis tried first and only when that fails willaggact on the Series.
And for multiples:
- (changed) When given a list-like or dict-like,
aggwill callaggfor each argument andapplywill callapply. Currentlyapplywill callaggin this case.
I've put up #49672 to show the implementation and the impact on our tests. Some examples:
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
# For reducers, apply and agg act the same on a DataFrame
print(df.apply(str))
# a 0 1\n1 2\n2 3\nName: a, dtype: int64
# b 0 4\n1 5\n2 6\nName: b, dtype: int64
# dtype: object
print(df.agg(str))
# a 0 1\n1 2\n2 3\nName: a, dtype: int64
# b 0 4\n1 5\n2 6\nName: b, dtype: int64
# dtype: object
# apply sees a Series output as not being a reducer, combines results with `concat(..., axis=1)`
# agg treats everything as a reducer. The result is a Series whose entries are themselves Series.
print(df.apply(lambda x: pd.concat([x, x])))
# a b
# 0 1 4
# 1 2 5
# 2 3 6
# 0 1 4
# 1 2 5
# 2 3 6
print(df.agg(lambda x: pd.concat([x, x])))
# a 0 1
# 1 2
# 2 3
# 0 1
# 1 2
# 2 3
# Name...
# b 0 4
# 1 5
# 2 6
# 0 4
# 1 5
# 2 6
# Name...
# dtype: object
# apply sees list output as not being a reducer, makes them into columns of the result (a no-op in this case)
# agg treats everything as a reducer
print(df.apply(lambda x: list(x)))
# a b
# 0 1 4
# 1 2 5
# 2 3 6
print(df.agg(lambda x: list(x)))
# a [1, 2, 3]
# b [4, 5, 6]
# dtype: object
Comment From: topper-123
I very much agree. Having df.aggalways return a Series if fund is a single argument would be a win in terms of users understanding their code and it makes the internal organization of the pandas code base clearer, that's a double win:-).
Comment From: jorisvandenbossche
One tangential note: with DataFrame.apply working on each column/Series, and DataFrame.applymap working on each column/Series row element, you would also say that the equivalent for applymap for Series is Series.map and not Series.apply.
(in any case, I have always found Series.apply vs Series.map difference also confusing)