The feature I'd like to suggest/request is pretty simple, it has workarounds but it seems very odd to me that this is not a native behavior. Specially because the workarounds seem so slow.
I imagine it as a simple parameter for the groupby
function.
The feature is: the ability to group a level (of a MultiIndex dataframe) but still retain that level's labels. I think it will be clear with the example below.
Note that this came from a question I asked in stack overflow
Code Sample
df = pd.DataFrame([['A', 1, 3],
['A', 2, 4],
['A', 3, 6],
['B', 1, 9],
['B', 2, 10],
['B', 4, 6]],
columns=pd.Index(['Name', 'Date', 'Value'], name='ColumnName')
).set_index(['Name', 'Date'])
df.groupby(level=('Name')).last()
Intended Output
ColumnName Value
Name Date
A 3 6
B 4 6
output of pandas 0.18.0
ColumnName Value
Name
A 6
B 6
Comment From: jorisvandenbossche
I also just answered on SO, but the solution is rather simple: when doing a groupby().last()
, the last item for each column will be calculated. So, you can just reset the index level to make a column of it, and then do the groupby:
In [44]: df.reset_index(level='Date').groupby(level=0).last()
Out[44]:
ColumnName Date Value
Name
A 3 6
B 4 6
Comment From: pedrobraz1990
Thanks @jorisvandenbossche , yet you don't think that a native way (faster) is justifiable ?
Comment From: jreback
@pedrobraz1990 this is as fast as it gets.
These are all cheap operations. If you have a performance issue then you are doing things in a non-performant / non-idiomatic way, e.g. iterating.
Comment From: pedrobraz1990
all right then. should I close this issue request then ?
Comment From: jreback
already did
Comment From: jorisvandenbossche
BTW, there is actually an even easier solution:
In [202]: df.groupby(level='Name').tail(1)
Out[202]:
ColumnName Value
Name Date
A 3 6
B 4 6
.tail()
acts like a 'filter' method, so keeping the index intact.
Comment From: pedrobraz1990
but then I won't get the whole dataframe, right ?
@jorisvandenbossche
Comment From: jorisvandenbossche
@pedrobraz1990 What do you mean exactly? Of course you don't get the whole dataframe, as you are selecting the last row of each group?
Comment From: pedrobraz1990
oh sorry. I had the impression that tail only returned the last 5 rows. But yes, I meant all the last of each group rows.
Comment From: jorisvandenbossche
Ah yes, tail()
on a dataframe will indeed return the last 5 rows, but when called on a groupby object, it will do that for each group (so tail(1)
only the last 1 row for each group).
Comment From: pedrobraz1990
Awesome ! thanks a lot @jorisvandenbossche