Code Sample, a copy-pastable example if possible
I use resample with group by method to run following code:
import pandas as pd
data = [{'buyer_id': 1, 'pay_time': '2016-01-01', 'tid': '11'}, {'buyer_id': 1, 'pay_time': '2016-01-03', 'tid': '12'}, {'buyer_id': 1, 'pay_time': '2016-01-05', 'tid': '13'}, {'buyer_id': 2, 'pay_time': '2016-01-01', 'tid': '21'}, {'buyer_id': 2, 'pay_time': '2016-01-02', 'tid': '22'}, {'buyer_id': 2, 'pay_time': '2016-01-05', 'tid': '23'}]
df['pay_time'] = pd.to_datetime(df['pay_time'])
df.set_index('pay_time').groupby('buyer_id').resample('1D').count()
program output like following:
buyer_id tid
buyer_id pay_time
1 2016-01-01 1 1
2016-01-03 1 1
2016-01-05 1 1
2 2016-01-01 1 1
2016-01-02 1 1
2016-01-05 1 1
but I want the output could pad the missing data with 0 like this:
buyer_id tid
buyer_id pay_time
1 2016-01-01 1 1
2016-01-02 0 0
2016-01-03 1 1
2016-01-04 0 0
2016-01-05 1 1
2 2016-01-01 1 1
2016-01-02 1 1
2016-01-03 0 0
2016-01-04 0 0
2016-01-05 1 1
If I don't use groupby method, resample can pad missing data with 0, like this:
df.set_index('pay_time').resample('1D').count()
buyer_id tid
pay_time
2016-01-01 2 2
2016-01-02 1 1
2016-01-03 1 1
2016-01-04 0 0
2016-01-05 2 2
I don't know why the behavior of using resample alone is different with resample with group by.
output of pd.show_versions()
NSTALLED VERSIONS
commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: zh_CN.UTF-8
pandas: 0.18.0 nose: None pip: 8.1.1 setuptools: 19.4 Cython: None numpy: 1.11.0 scipy: None statsmodels: None xarray: None IPython: 4.2.0 sphinx: 1.3.5 patsy: None dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: 2.8 boto: None
Comment From: jorisvandenbossche
I think this is fixed in 0.18.1:
In [11]: df = pd.DataFrame(data)
In [12]: df['pay_time'] = pd.to_datetime(df['pay_time'])
In [13]: df
Out[13]:
buyer_id pay_time tid
0 1 2016-01-01 11
1 1 2016-01-03 12
2 1 2016-01-05 13
3 2 2016-01-01 21
4 2 2016-01-02 22
5 2 2016-01-05 23
In [14]: df.set_index('pay_time').groupby('buyer_id').resample('1D').count()
Out[14]:
buyer_id tid
buyer_id pay_time
1 2016-01-01 1 1
2016-01-02 0 0
2016-01-03 1 1
2016-01-04 0 0
2016-01-05 1 1
2 2016-01-01 1 1
2016-01-02 1 1
2016-01-03 0 0
2016-01-04 0 0
2016-01-05 1 1
In [15]: pd.__version__
Out[15]: '0.18.1'
Are you able to check that?
Comment From: jorisvandenbossche
Probably fixed by #12743
Comment From: starplanet
You are right, my pandas version is 0.18.0. I upgrade my pandas and the problem is solved.
Thanks.
------------------ Original ------------------ From: "Joris Van den Bossche"notifications@github.com; Date: Thu, May 12, 2016 06:14 PM To: "pydata/pandas"pandas@noreply.github.com; Cc: "starplanet"zhangjinjie@yimian.com.cn; "Author"author@noreply.github.com; Subject: Re: [pydata/pandas] why resample with group by not pad null value(#13151)
I think this is fixed in 0.18.1:
In [11]: df = pd.DataFrame(data) In [12]: df['pay_time'] = pd.to_datetime(df['pay_time']) In [13]: df Out[13]: buyer_id pay_time tid 0 1 2016-01-01 11 1 1 2016-01-03 12 2 1 2016-01-05 13 3 2 2016-01-01 21 4 2 2016-01-02 22 5 2 2016-01-05 23 In [14]: df.set_index('pay_time').groupby('buyer_id').resample('1D').count() Out[14]: buyer_id tid buyer_id pay_time 1 2016-01-01 1 1 2016-01-02 0 0 2016-01-03 1 1 2016-01-04 0 0 2016-01-05 1 1 2 2016-01-01 1 1 2016-01-02 1 1 2016-01-03 0 0 2016-01-04 0 0 2016-01-05 1 1 In [15]: pd.version Out[15]: '0.18.1'
Are you able to check that?
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub
Comment From: jorisvandenbossche
Thanks for checking!