Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
In [11]: import pandas as pd
In [12]: pd.__version__
Out[12]: '1.4.4'
In [13]: df = pd.DataFrame([[1,2], [3,4]], columns=["a", "b"])
In [14]: df.groupby(['a', 'b']).nth(0)
Out[14]:
Empty DataFrame
Columns: []
Index: [(1, 2), (3, 4)]
In [15]: df.groupby(['a', 'b']).nth(0).index
Out[15]:
MultiIndex([(1, 2),
(3, 4)],
names=['a', 'b'])
In [16]: df.groupby(['a', 'b']).nth(1)
Out[16]:
Empty DataFrame
Columns: []
Index: []
In [17]: df.groupby(['a', 'b']).nth(1).index
Out[17]: MultiIndex([], names=['a', 'b'])
In [18]: df.groupby(['a', 'b']).nth(2)
Out[18]:
Empty DataFrame
Columns: []
Index: []
In [19]: df.groupby(['a', 'b']).nth(2).index
Out[19]: MultiIndex([], names=['a', 'b'])
In [20]: df.groupby(['a', 'b']).nth(-1)
Out[20]:
Empty DataFrame
Columns: []
Index: [(1, 2), (3, 4)]
In [21]: df.groupby(['a', 'b']).nth(-1).index
Out[21]:
MultiIndex([(1, 2),
(3, 4)],
names=['a', 'b'])
In [22]: df.groupby(['a', 'b']).nth(-2)
Out[22]:
Empty DataFrame
Columns: []
Index: []
In [23]: df.groupby(['a', 'b']).nth(-2).index
Out[23]: MultiIndex([], names=['a', 'b'])
In [24]: df.groupby(['a', 'b']).sum()
Out[24]:
Empty DataFrame
Columns: []
Index: [(1, 2), (3, 4)]
In [25]: df.groupby(['a', 'b']).sum().index
Out[25]:
MultiIndex([(1, 2),
(3, 4)],
names=['a', 'b'])
In [26]: df.groupby(['a', 'b']).max()
Out[26]:
Empty DataFrame
Columns: []
Index: [(1, 2), (3, 4)]
In [27]: df.groupby(['a', 'b']).max().index
Out[27]:
MultiIndex([(1, 2),
(3, 4)],
names=['a', 'b'])
Issue Description
when there is no aggregation columns, the GroupBy.nth
returns 2 kinds of indices:
1, n=0
or n=-1
MultiIndex([(1, 2),
(3, 4)],
names=['a', 'b'])
the returned index is as expected, since it behaves like other functions in GroupBy
2, otherwise
MultiIndex([], names=['a', 'b'])
this should be the same as above.
Expected Behavior
In [16]: df.groupby(['a', 'b']).nth(1)
Out[16]:
Empty DataFrame
Columns: []
Index: [(1, 2), (3, 4)]
In [17]: df.groupby(['a', 'b']).nth(1).index
MultiIndex([(1, 2),
(3, 4)],
names=['a', 'b'])
Installed Versions
INSTALLED VERSIONS
------------------
commit : ca60aab7340d9989d9428e11a51467658190bb6b
python : 3.9.12.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.4.4
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 61.2.0
pip : 21.2.4
Cython : None
pytest : 7.1.2
hypothesis : None
sphinx : 3.0.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : 2.0.1
matplotlib : 3.2.2
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 8.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
snappy : None
sqlalchemy : 1.4.39
tables : None
tabulate : 0.8.10
xarray : None
xlrd : None
xlwt : None
zstandard : None
Comment From: paradox-lab
['a', 'b']
is actually an aggregation column.
df = pd.DataFrame([[1, 2, 3], [3, 4, 5], [1, 2, 4]], columns=["a", "b", "c"])
df
Out[7]:
a b c
0 1 2 3
1 3 4 5
2 1 2 4
df.groupby(['a', 'b']).nth(0)
Out[9]:
c
a b
1 2 3
3 4 5
df.groupby(['a', 'b']).mean(0)
Out[10]:
c
a b
1 2 3.5
3 4 5.0
df.groupby(['a', 'b']).sum(0)
Out[11]:
c
a b
1 2 7
3 4 5
df.groupby(['a', 'b']).max(0)
Out[12]:
c
a b
1 2 4
3 4 5
df.groupby(['a', 'b']).nth(1)
Out[13]:
c
a b
1 2 4
Comment From: rhshadrach
Now that nth
acts as a filter, it returns the same index shape as the input as it should. Closing.