Code Sample, a copy-pastable example if possible
# Your code here
import pandas as pd
df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),'size': list('SSMMMLL'),'weight': [8, 10, 11, 1, 20, 12, 12],'adult' : [False] * 5 + [True] * 2})
gb = df.groupby(['animal'])
def GrowUp(x):
avg_weight = sum(x[x['size'] == 'S'].weight * 1.5)
avg_weight += sum(x[x['size'] == 'M'].weight * 1.25)
avg_weight += sum(x[x['size'] == 'L'].weight)
avg_weight /= len(x)
print(x)
return pd.Series(['L',avg_weight,True], index=['size', 'weight', 'adult'])
expected_df = gb.apply(GrowUp)
print(expected_df)
Problem description
From output printed, 'cat' group was printed twice
adult animal size weight
0 False cat S 8
2 False cat M 11
5 True cat L 12
6 True cat L 12
adult animal size weight
0 False cat S 8
2 False cat M 11
5 True cat L 12
6 True cat L 12
adult animal size weight
1 False dog S 10
4 False dog M 20
adult animal size weight
3 False fish M 1
size weight adult
animal
cat L 12.4375 True
dog L 20.0000 True
fish L 1.2500 True
[this should explain why the current behaviour is a problem and why the expected output is a better solution.]
'cat' group should only call .apply() once
Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!
Note: Many problems can be resolved by simply upgrading pandas
to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master
addresses this issue, but that is not necessary.
For documentation-related issues, you can check the latest versions of the docs on master
here:
https://pandas-docs.github.io/pandas-docs-travis/
If the issue has not been resolved there, go ahead and file it in the issue tracker.
Expected Output
adult animal size weight
0 False cat S 8
2 False cat M 11
5 True cat L 12
6 True cat L 12
adult animal size weight
1 False dog S 10
4 False dog M 20
adult animal size weight
3 False fish M 1
size weight adult
animal
cat L 12.4375 True
dog L 20.0000 True
fish L 1.2500 True
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.4.final.0 python-bits: 32 OS: Windows OS-release: 7 machine: x86 processor: x86 Family 6 Model 42 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: Chinese (Simplified)_People's Republic of China.936
pandas: 0.22.0 pytest: None pip: 9.0.1 setuptools: 28.8.0 Cython: None numpy: 1.14.0 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.1.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Comment From: jreback
big red box : http://pandas.pydata.org/pandas-docs/stable/groupby.html#flexible-apply
this is an implementation detail
Comment From: Qiaoyoufeng
Hi, Jeff: I get it, i am just learning about pandas, and have not read that part yet, sorry about the ticket, thanks
At 2018-01-20 23:41:00, "Jeff Reback" notifications@github.com wrote:
big red box : http://pandas.pydata.org/pandas-docs/stable/groupby.html#flexible-apply
this is an implementation detail
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.