Pandas Unexpected output of table pivoting when there is little data; a TimeGrouper object is in the index.

# coding: utf-8
import pandas as pd
import numpy as np

frame = pd.read_csv("table.csv", engine="python", parse_dates=['since'])
print frame
d = pd.pivot_table(frame, index=pd.TimeGrouper(key='since', freq='1d'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
print d
print "^that is not what I expected"
frame = pd.read_csv("table2.csv", engine="python", parse_dates=['since'])  # add some values to a day
print frame
d = pd.pivot_table(frame, index=pd.TimeGrouper(key='since', freq='1d'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
print d

The following data is the contents of table.csv

"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"

This is table2.csv:

"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-01-31 08:00:00+00:00","2200.0000"
"81","2015-01-31 09:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"

The output of print after pivoting table.csv

                                                                id  value
<pandas.tseries.resample.TimeGrouper object at 0x7fc595f96c10>  81   2200
id                                                              81   2200

I would expect something like this:

            value
id             81
since            
2015-01-31   2200
2015-02-01   2200

I can trace the problem to here:

https://github.com/pydata/pandas/blob/62529cca28e9c8652ddf7cca3aa6d41d4e30bc0e/pandas/tools/pivot.py#L114 the index created by groupby already has the object there.

I can't figure anything else. What is the problem, any fixes?

Thanks.

Comment From: jreback

Here's your example easily copy-pastable. FYI, you don't need to specify an engine=... to read_csv.

Prob an edge case, a pull-request would be welcome.

In [24]: data2 = """                                               
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-01-31 08:00:00+00:00","2200.0000"
"81","2015-01-31 09:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
"""

In [25]: df2 = pd.read_csv(StringIO(data2),  parse_dates=['since'])

In [26]: df2
Out[26]: 
   id               since  value
0  81 2015-01-31 07:00:00   2200
1  81 2015-01-31 08:00:00   2200
2  81 2015-01-31 09:00:00   2200
3  81 2015-02-01 07:00:00   2200

In [27]: pd.pivot_table(df2, index=pd.Grouper(key='since',freq='1D'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
Out[27]: 
           value
id            81
since           
2015-01-31  6600
2015-02-01  2200

In [28]: data = """                                                                 
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
"""

In [29]: df = pd.read_csv(StringIO(data),  parse_dates=['since'])

In [30]: df
Out[30]: 
   id               since  value
0  81 2015-01-31 07:00:00   2200
1  81 2015-02-01 07:00:00   2200

In [31]: pd.pivot_table(df, index=pd.Grouper(key='since',freq='1D'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
Out[31]: 
                                                             id  value
<pandas.tseries.resample.TimeGrouper object at 0x107714450>  81   2200
id                                                           81   2200

Comment From: alep

https://github.com/pydata/pandas/blob/v0.15.2/pandas/core/groupby.py#L2079 the problem seems to be there. If I set match_axis_length = False, it works fine, So I guess some condition is missing, either here: https://github.com/pydata/pandas/blob/v0.15.2/pandas/core/groupby.py#L2092 or when setting match_axis_length

Comment From: alep

This is duplicate of https://github.com/pydata/pandas/issues/8542

Comment From: TomAugspurger

Closing as a duplicate.