# coding: utf-8
import pandas as pd
import numpy as np
frame = pd.read_csv("table.csv", engine="python", parse_dates=['since'])
print frame
d = pd.pivot_table(frame, index=pd.TimeGrouper(key='since', freq='1d'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
print d
print "^that is not what I expected"
frame = pd.read_csv("table2.csv", engine="python", parse_dates=['since']) # add some values to a day
print frame
d = pd.pivot_table(frame, index=pd.TimeGrouper(key='since', freq='1d'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
print d
The following data is the contents of table.csv
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
This is table2.csv
:
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-01-31 08:00:00+00:00","2200.0000"
"81","2015-01-31 09:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
The output of print after pivoting table.csv
id value
<pandas.tseries.resample.TimeGrouper object at 0x7fc595f96c10> 81 2200
id 81 2200
I would expect something like this:
value
id 81
since
2015-01-31 2200
2015-02-01 2200
I can trace the problem to here:
https://github.com/pydata/pandas/blob/62529cca28e9c8652ddf7cca3aa6d41d4e30bc0e/pandas/tools/pivot.py#L114 the index created by groupby already has the object there.
I can't figure anything else. What is the problem, any fixes?
Thanks.
Comment From: jreback
Here's your example easily copy-pastable. FYI, you don't need to specify an engine=...
to read_csv.
Prob an edge case, a pull-request would be welcome.
In [24]: data2 = """
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-01-31 08:00:00+00:00","2200.0000"
"81","2015-01-31 09:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
"""
In [25]: df2 = pd.read_csv(StringIO(data2), parse_dates=['since'])
In [26]: df2
Out[26]:
id since value
0 81 2015-01-31 07:00:00 2200
1 81 2015-01-31 08:00:00 2200
2 81 2015-01-31 09:00:00 2200
3 81 2015-02-01 07:00:00 2200
In [27]: pd.pivot_table(df2, index=pd.Grouper(key='since',freq='1D'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
Out[27]:
value
id 81
since
2015-01-31 6600
2015-02-01 2200
In [28]: data = """
"id","since","value"
"81","2015-01-31 07:00:00+00:00","2200.0000"
"81","2015-02-01 07:00:00+00:00","2200.0000"
"""
In [29]: df = pd.read_csv(StringIO(data), parse_dates=['since'])
In [30]: df
Out[30]:
id since value
0 81 2015-01-31 07:00:00 2200
1 81 2015-02-01 07:00:00 2200
In [31]: pd.pivot_table(df, index=pd.Grouper(key='since',freq='1D'), values=["value"], columns=['id'], aggfunc=np.sum, fill_value=0)
Out[31]:
id value
<pandas.tseries.resample.TimeGrouper object at 0x107714450> 81 2200
id 81 2200
Comment From: alep
https://github.com/pydata/pandas/blob/v0.15.2/pandas/core/groupby.py#L2079 the problem seems to be there. If I set match_axis_length = False
, it works fine, So I guess some condition is missing, either here: https://github.com/pydata/pandas/blob/v0.15.2/pandas/core/groupby.py#L2092 or when setting match_axis_length
Comment From: alep
This is duplicate of https://github.com/pydata/pandas/issues/8542
Comment From: TomAugspurger
Closing as a duplicate.