Code Sample, a copy-pastable example if possible
df = pd.DataFrame.from_dict({"A":[1, 1, 2, 2],
"B":[np.array([1, 2, 3, 4]),
np.array([5, 6, 7, 8]),
np.array([1, 2, 3, 4]),
np.array([5, 6, 7, 8])]})
def mean_of_arrays(x):
"""
Returns the mean of the two arrays.
"""
print((x.iloc[0] + x.iloc[1])/2)
return (x.iloc[0] + x.iloc[1])/2
df.groupby(["A"])["B"].transform(mean_of_arrays)
# Print [3, 4, 5, 6], [3, 4, 5, 6], as expected...
# But returns [3, 4, 3, 4]
Problem description
When a column contains a list of numpy arrays, and a "transform" operation is completed on those lists, the lists are not returned by the transform operation. Instead, the n-th element of the list is return, where n is the number of elements for each group.
Expected Output
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.4
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.5.3
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
boto: 2.46.1
pandas_datareader: None
Comment From: TomAugspurger
What's your expected output here? .transform
returns an output with the same shape as the input.
It looks like you want .apply
maybe?
In [14]: df.groupby(["A"])['B'].apply(mean_of_arrays)
[ 3. 4. 5. 6.]
[ 3. 4. 5. 6.]
Out[14]:
A
1 [3.0, 4.0, 5.0, 6.0]
2 [3.0, 4.0, 5.0, 6.0]
Name: B, dtype: object
Comment From: QuentinAndre
I see my mistake now, thank you for pointing it out. Indeed .apply would make more sense than .transform in this context.
I am closing the issue.