Pandas groupby.transform() fails when performed on column of numpy arrays

Code Sample, a copy-pastable example if possible


df = pd.DataFrame.from_dict({"A":[1, 1, 2, 2],
                            "B":[np.array([1, 2, 3, 4]),
                                np.array([5, 6, 7, 8]),
                                np.array([1, 2, 3, 4]),
                                np.array([5, 6, 7, 8])]})

def mean_of_arrays(x):
    """
    Returns the mean of the two arrays.
    """
    print((x.iloc[0] + x.iloc[1])/2)
    return (x.iloc[0] + x.iloc[1])/2

df.groupby(["A"])["B"].transform(mean_of_arrays)
# Print [3, 4, 5, 6], [3, 4, 5, 6], as expected...
# But returns [3, 4, 3, 4]

Problem description

When a column contains a list of numpy arrays, and a "transform" operation is completed on those lists, the lists are not returned by the transform operation. Instead, the n-th element of the list is return, where n is the number of elements for each group.

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 statsmodels: 0.8.0 xarray: None IPython: 5.3.0 sphinx: 1.5.4 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.0 tables: 3.2.2 numexpr: 2.6.2 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.5.3 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 boto: 2.46.1 pandas_datareader: None

Comment From: TomAugspurger

What's your expected output here? .transform returns an output with the same shape as the input.

It looks like you want .apply maybe?

In [14]: df.groupby(["A"])['B'].apply(mean_of_arrays)
[ 3.  4.  5.  6.]
[ 3.  4.  5.  6.]
Out[14]:
A
1    [3.0, 4.0, 5.0, 6.0]
2    [3.0, 4.0, 5.0, 6.0]
Name: B, dtype: object

Comment From: QuentinAndre

I see my mistake now, thank you for pointing it out. Indeed .apply would make more sense than .transform in this context.

I am closing the issue.

Pandas groupby.transform() fails when performed on column of numpy arrays

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`