Pandas why resample with group by not pad null value

Code Sample, a copy-pastable example if possible

I use resample with group by method to run following code:

import pandas as pd
data = [{'buyer_id': 1, 'pay_time': '2016-01-01', 'tid': '11'}, {'buyer_id': 1, 'pay_time': '2016-01-03', 'tid': '12'}, {'buyer_id': 1, 'pay_time': '2016-01-05', 'tid': '13'}, {'buyer_id': 2, 'pay_time': '2016-01-01', 'tid': '21'}, {'buyer_id': 2, 'pay_time': '2016-01-02', 'tid': '22'}, {'buyer_id': 2, 'pay_time': '2016-01-05', 'tid': '23'}]

df['pay_time'] = pd.to_datetime(df['pay_time'])
df.set_index('pay_time').groupby('buyer_id').resample('1D').count()

program output like following:

                     buyer_id  tid
buyer_id pay_time
1        2016-01-01         1    1
          2016-01-03         1    1
          2016-01-05         1    1
2        2016-01-01         1    1
         2016-01-02         1    1
         2016-01-05         1    1

but I want the output could pad the missing data with 0 like this:

                     buyer_id  tid
buyer_id pay_time
1        2016-01-01         1    1
          2016-01-02         0  0
          2016-01-03         1    1
          2016-01-04         0  0
          2016-01-05         1    1
2        2016-01-01         1    1
         2016-01-02         1    1
         2016-01-03        0  0
         2016-01-04        0  0
         2016-01-05         1    1

If I don't use groupby method, resample can pad missing data with 0, like this:

df.set_index('pay_time').resample('1D').count()

            buyer_id  tid
pay_time
2016-01-01         2    2
2016-01-02         1    1
2016-01-03         1    1
2016-01-04         0    0
2016-01-05         2    2

I don't know why the behavior of using resample alone is different with resample with group by.

output of `pd.show_versions()`

NSTALLED VERSIONS

commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: zh_CN.UTF-8

pandas: 0.18.0 nose: None pip: 8.1.1 setuptools: 19.4 Cython: None numpy: 1.11.0 scipy: None statsmodels: None xarray: None IPython: 4.2.0 sphinx: 1.3.5 patsy: None dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: 2.8 boto: None

Comment From: jorisvandenbossche

I think this is fixed in 0.18.1:

In [11]: df = pd.DataFrame(data)

In [12]: df['pay_time'] = pd.to_datetime(df['pay_time'])

In [13]: df
Out[13]:
   buyer_id   pay_time tid
0         1 2016-01-01  11
1         1 2016-01-03  12
2         1 2016-01-05  13
3         2 2016-01-01  21
4         2 2016-01-02  22
5         2 2016-01-05  23

In [14]: df.set_index('pay_time').groupby('buyer_id').resample('1D').count()
Out[14]:
                     buyer_id  tid
buyer_id pay_time
1        2016-01-01         1    1
         2016-01-02         0    0
         2016-01-03         1    1
         2016-01-04         0    0
         2016-01-05         1    1
2        2016-01-01         1    1
         2016-01-02         1    1
         2016-01-03         0    0
         2016-01-04         0    0
         2016-01-05         1    1

In [15]: pd.__version__
Out[15]: '0.18.1'

Are you able to check that?

Comment From: jorisvandenbossche

Probably fixed by #12743

Comment From: starplanet

You are right, my pandas version is 0.18.0. I upgrade my pandas and the problem is solved.

Thanks.

------------------ Original ------------------ From: "Joris Van den Bossche"notifications@github.com; Date: Thu, May 12, 2016 06:14 PM To: "pydata/pandas"pandas@noreply.github.com; Cc: "starplanet"zhangjinjie@yimian.com.cn; "Author"author@noreply.github.com; Subject: Re: [pydata/pandas] why resample with group by not pad null value(#13151)

I think this is fixed in 0.18.1: In [11]: df = pd.DataFrame(data) In [12]: df['pay_time'] = pd.to_datetime(df['pay_time']) In [13]: df Out[13]: buyer_id pay_time tid 0 1 2016-01-01 11 1 1 2016-01-03 12 2 1 2016-01-05 13 3 2 2016-01-01 21 4 2 2016-01-02 22 5 2 2016-01-05 23 In [14]: df.set_index('pay_time').groupby('buyer_id').resample('1D').count() Out[14]: buyer_id tid buyer_id pay_time 1 2016-01-01 1 1 2016-01-02 0 0 2016-01-03 1 1 2016-01-04 0 0 2016-01-05 1 1 2 2016-01-01 1 1 2016-01-02 1 1 2016-01-03 0 0 2016-01-04 0 0 2016-01-05 1 1 In [15]: pd.version Out[15]: '0.18.1'
Are you able to check that?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

Comment From: jorisvandenbossche

Thanks for checking!

Pandas why resample with group by not pad null value

Code Sample, a copy-pastable example if possible

output of pd.show_versions()

NSTALLED VERSIONS

output of `pd.show_versions()`