Pandas keepdims fails when taking mean

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

print(np.mean(np.zeros((30, 30)), axis=0, keepdims=True).shape,
pd.DataFrame(np.zeros((30, 30))).mean( axis=0, keepdims=True).shape,
np.mean(pd.DataFrame(np.zeros((30, 30))), axis=0, keepdims=True).shape)

Problem description

The keepdims argument is ignored when taking the mean of pandas dataframes, resulting in an array of lower dimensionality. Oddly, there's no documentation for pd.DataFrame.mean suggesting keepdims should even be allowed as an argument, but it accepts it with no effect. This also modifies the behavior of np.mean.

Expected Output

(1, 30) (1,30,) (1,30,)

Output of `pd.show_versions()`

(1, 30) (30,) (30,)

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-75-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 pandas: 0.17.1 nose: 1.3.7 pip: None setuptools: None Cython: None numpy: 1.11.0 scipy: 0.17.0 statsmodels: None IPython: 2.4.1 sphinx: None patsy: None dateutil: 2.4.2 pytz: 2014.10 blosc: None bottleneck: None tables: 3.2.2 numexpr: 2.4.3 matplotlib: 1.5.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.4.1 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None Jinja2: None

Comment From: chris-b1

In 0.19.2+ this raises an error, as keepdims is not a supported parameter to mean. The reason this 'worked' (didn't raise an error) in older versions is that we accepted a **kwargs parameter, but didn't validate it.

Comment From: TomAugspurger

Is there any desire for a keepdims argument? I haven't seen it requested beore.

@seth-a did you have a need for it, or were you surprised by the inconsistency / non-validation?

Comment From: seth-a

@chris-b1 Thanks for pointing that out. @TomAugspurger There's no need for it in DataFrame.mean, I just included that for comparison with the numpy code. The bigger issue for me is that calling np.mean with a pandas DataFrame causes different behavior than with a numpy array.

I don't know what exactly pandas is doing, but it shouldn't be modifying how numpy behavior, and that is still an issue in the latest version. `

I would expect

df=pd.DataFrame(np.zeros((30, 30))
np.mean(df.values), axis=0, keepdims=True).shape

To give the same results as

np.mean(df.values), axis=0, keepdims=True).shape

But the second line will throw an error. If pandas can't give complete transparency when calling np.mean it shouldn't try to at all.

Comment From: TomAugspurger

If pandas can't give complete transparency when calling np.mean it shouldn't try to at all.

That's how np.mean works. It checks the object for a mean method (and probably some other array-like characteristics that pandas meets) and then calls that object's mean. That's why np.mean(df) returns a Series, not a numpy array.

Comment From: seth-a

Ah, that makes sense. Nope, looking at the code it only checks that there's a mean property, not even if it's a method:

    if type(a) is not mu.ndarray:
        try:
            mean = a.mean
        except AttributeError:
            pass
        else:
            return mean(axis=axis, dtype=dtype, out=out, **kwargs)

Well, definitely not a pandas bug. Thanks for looking at that.

Comment From: TomAugspurger

Ok, let's close for now. We can close for now. We can revisit a keepdims keyword in the future if there's interest.

Pandas keepdims fails when taking mean

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`