Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
print(np.mean(np.zeros((30, 30)), axis=0, keepdims=True).shape,
pd.DataFrame(np.zeros((30, 30))).mean( axis=0, keepdims=True).shape,
np.mean(pd.DataFrame(np.zeros((30, 30))), axis=0, keepdims=True).shape)
Problem description
The keepdims
argument is ignored when taking the mean of pandas dataframes, resulting in an array of lower dimensionality.
Oddly, there's no documentation for pd.DataFrame.mean suggesting keepdims should even be allowed as an argument, but it accepts it with no effect. This also modifies the behavior of np.mean.
Expected Output
(1, 30) (1,30,) (1,30,)
Output of pd.show_versions()
(1, 30) (30,) (30,)
Comment From: chris-b1
In 0.19.2
+ this raises an error, as keepdims
is not a supported parameter to mean. The reason this 'worked' (didn't raise an error) in older versions is that we accepted a **kwargs
parameter, but didn't validate it.
Comment From: TomAugspurger
Is there any desire for a keepdims argument? I haven't seen it requested beore.
@seth-a did you have a need for it, or were you surprised by the inconsistency / non-validation?
Comment From: seth-a
@chris-b1 Thanks for pointing that out.
@TomAugspurger There's no need for it in DataFrame.mean
, I just included that for comparison with the numpy code. The bigger issue for me is that calling np.mean with a pandas DataFrame causes different behavior than with a numpy array.
I don't know what exactly pandas is doing, but it shouldn't be modifying how numpy behavior, and that is still an issue in the latest version. `
I would expect
df=pd.DataFrame(np.zeros((30, 30))
np.mean(df.values), axis=0, keepdims=True).shape
To give the same results as
np.mean(df.values), axis=0, keepdims=True).shape
But the second line will throw an error. If pandas can't give complete transparency when calling np.mean
it shouldn't try to at all.
Comment From: TomAugspurger
If pandas can't give complete transparency when calling np.mean it shouldn't try to at all.
That's how np.mean
works. It checks the object for a mean
method (and probably some other array-like characteristics that pandas meets) and then calls that object's mean. That's why np.mean(df)
returns a Series, not a numpy array.
Comment From: seth-a
Ah, that makes sense. Nope, looking at the code it only checks that there's a mean
property, not even if it's a method:
if type(a) is not mu.ndarray:
try:
mean = a.mean
except AttributeError:
pass
else:
return mean(axis=axis, dtype=dtype, out=out, **kwargs)
Well, definitely not a pandas bug. Thanks for looking at that.
Comment From: TomAugspurger
Ok, let's close for now. We can close for now. We can revisit a keepdims
keyword in the future if there's interest.