I already wrote about this in stackoverflow but got no answers.
Is there a reason why swapping the elements in val2, groupby returns a different result? Index changes from MultiIndex to DateTimeIndex and the group-time is lost.
import pandas as pd
from datetime import datetime
def legend(text, obj):
print "\n" + "="*50 + "\n" + text
print obj.head(1000)
df = pd.DataFrame({'dtime': [datetime(2017,01,01,1,5),
datetime(2017,01,01,1,20)],
'val1': [11, None],
'val2': [None, 31] # [31, None] <=== this doesn't work!
})
legend("10 Original df", df)
df = (df.melt("dtime")
.dropna()
.set_index("dtime"))
legend("20 After melt", df)
group = df.groupby(pd.Grouper(freq="1H"))
legend("30 Group by dtime", group)
group = group["value"].apply(lambda x: x.sort_values(ascending=False))
legend("40 Find max 3 for each hour", group)
df = pd.DataFrame(group)
print "\n\nIndex info:", df.info()
Comment From: jreback
after melting, the df's are different (2 times in df), 1 in (df2 which is the reversed [31, None])
In [17]: df
Out[17]:
variable value
dtime
2017-01-01 01:05:00 val1 11.0
2017-01-01 01:20:00 val2 31.0
In [18]: df2
Out[18]:
variable value
dtime
2017-01-01 01:05:00 val1 11.0
2017-01-01 01:05:00 val2 31.0
Comment From: Nemecsek
@jreback, of course the time of val2 changes (I wrote this in my comment on stackoverflow but didn't here).
According to my knowledge there should be no difference in grouping, as both times are inside the same hour. In my case I am interested only in the grouping time (I later remove the column with the event time) and the difference in val2
shouldn't influence the result.
My question is: why I get a Multiindex in the first case, and a DateTimeIndex in the second? Is this normal? Why the grouping time disappears in the second case? Shouldn't it be always present?