df = pd.DataFrame({'A' : [1, 1, 3, 4], 'B': [7, 7, 7, 8], 'C': [5, 6, 7, 8], 'D': [5, 6, 7, 8]}) print(df) print("") print(df.groupby(["A", "B"]).rank()) print("") print(df.groupby(["A", "B"]).sum()) print("") print(df.groupby(["A", "B"]).count()) print("")
Expected Output
Would expect all three functions to show similar output. Sum and count work as expected but rank does not show A/B.
output of pd.show_versions()
A B C D 0 1 7 5 5 1 1 7 6 6 2 3 7 7 7 3 4 8 8 8
C D
0 1.0 1.0 1 2.0 2.0 2 1.0 1.0 3 1.0 1.0
C D
A B
1 7 11 11
3 7 7 7
4 8 8 8
C D
A B
1 7 2 2
3 7 1 1
4 8 1 1
Comment From: simonm3
[pip.vcs:DEBUG]:Registered VCS backend: git (C:\Users\s\Anaconda3\lib\site-packages\pip\vcs__init__.py\59, time=16:56) [pip.vcs:DEBUG]:Registered VCS backend: hg (C:\Users\s\Anaconda3\lib\site-packages\pip\vcs__init__.py\59, time=16:56) [pip.pep425tags:DEBUG]:Config variable 'Py_DEBUG' is unset, Python ABI tag may be incorrect (C:\Users\s\Anaconda3\lib\site-packages\pip\pep425tags.py\80, time=16:56) [pip.pep425tags:DEBUG]:Config variable 'WITH_PYMALLOC' is unset, Python ABI tag may be incorrect (C:\Users\s\Anaconda3\lib\site-packages\pip\pep425tags.py\80, time=16:56) [pip.pep425tags:DEBUG]:Config variable 'Py_DEBUG' is unset, Python ABI tag may be incorrect (C:\Users\s\Anaconda3\lib\site-packages\pip\pep425tags.py\80, time=16:56) [pip.pep425tags:DEBUG]:Config variable 'WITH_PYMALLOC' is unset, Python ABI tag may be incorrect (C:\Users\s\Anaconda3\lib\site-packages\pip\pep425tags.py\80, time=16:56) [pip.vcs:DEBUG]:Registered VCS backend: svn (C:\Users\s\Anaconda3\lib\site-packages\pip\vcs__init__.py\59, time=16:56) [pip.vcs:DEBUG]:Registered VCS backend: bzr (C:\Users\s\Anaconda3\lib\site-packages\pip\vcs__init__.py\59, time=16:56)
INSTALLED VERSIONS
commit: None python: 3.5.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None
pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 20.7.0 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.1 statsmodels: 0.6.1 xarray: None IPython: 4.1.1 sphinx: 1.3.1 patsy: 0.4.0 dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.5.2 matplotlib: 1.5.1 openpyxl: 2.2.6 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.7.7 lxml: 3.4.4 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.9 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.38.0 pandas_datareader: None time: 4.43 s
Comment From: jorisvandenbossche
sum
/count
on the one hand and rank
on the other hand are different kind of methods. sum
and count
are 'aggregation' methods (they return one value for each group, and by default they use the group keys as the index in the result), while rank
is a 'transform' method (it returns an object of the same shape, with the same index as the original dataframe).
For this reason, it is normal that you see another index for the different results.
Comment From: jorisvandenbossche
@simonm3 I am closing this issue for now, but that certainly does not mean you cannot further comment or clarify your question!