Resampling weekly doesn't behave the same way as resampling daily when using label='right'.


import numpy as np
import pandas as pd

dates = pd.date_range('2001-01-01 10:00', '2001-01-15 16:00', freq='12h')
d = pd.DataFrame(dict(A=np.arange(len(dates))), index=dates)

so d looks like this:

                      A
2001-01-01 10:00:00   0
2001-01-01 22:00:00   1
2001-01-02 10:00:00   2
2001-01-02 22:00:00   3
2001-01-03 10:00:00   4
2001-01-03 22:00:00   5
2001-01-04 10:00:00   6
2001-01-04 22:00:00   7
2001-01-05 10:00:00   8
2001-01-05 22:00:00   9
2001-01-06 10:00:00  10
2001-01-06 22:00:00  11
<truncated>

Then we resample daily:

d.resample('D', label='right').last()

Produces the following output.

             A
2001-01-02   1
2001-01-03   3
2001-01-04   5
2001-01-05   7
2001-01-06   9
<truncated>

Note that the value for 2001-01-05 is 7, the last value on 2001-01-04 in the original dataframe. If we resample weekly:

d.resample('W-FRI', label='right').last()

Output is

             A
2001-01-05   9
2001-01-12  23
2001-01-19  29

This time, the value labelled 2001-01-05 is the last value on 2001-01-05 not the last value on 2001-01-04. This is inconsistent with the behaviour for daily resample, where the label is always strictly after the data.

The result I expect is

             A
2001-01-06   9
2001-01-13  23
2001-01-20  29

as the end of a W-FRI bucket should be midnight on the following Saturday. This is consistent with the daily resampling behaviour where the end of the 2001-01-05 bucket is midnight on 2001-01-06.

Output of installed versions below.

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: Linux
Release: 2.6.18-308.el5
Processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.13.0
Cython: Not installed
Numpy: 1.7.1
Scipy: 0.9.0
statsmodels: Not installed
    patsy: Not installed
scikits.timeseries: Not installed
dateutil: 1.5
pytz: 2011k
bottleneck: Not installed
PyTables: 2.3.1-1
    numexpr: 2.0.1
matplotlib: 1.1.1
openpyxl: Not installed
xlrd: 0.8.0
xlwt: 0.7.4
xlsxwriter: Not installed
sqlalchemy: 0.8.2
lxml: 2.3.6
bs4: Not installed
html5lib: 0.90
bigquery: Not installed
apiclient: Not installed

Comment From: jreback

looks same / related to #4197, #4076 ?

Comment From: dbew

It might be related - I don't know the underlying code - but it's not the same.

4197 is about creating bins where partial data is available. Their example is downsample to 5 minute data and the last bin has only 3 minutes of data. The question is should that last bin have been created? (There is an example where label='right' should be used but this matches the example above for daily resampling)

4076 is about extra bins when resampling e.g. you downsample to weekly from daily data, and in some circumstances you get an extra bin at the end which does not correspond to any of the data.

In this issue, the labelling does not correspond to the data included in the weekly bin: the first bin is labelled 2001-01-05 (i.e. midnight between 2001-01-04 and 2001-01-05) but contains data from after this point. So we have forward information in our resampled data.

Comment From: MarcoGorelli

This is correct

As the documentation says:

closed{‘right’, ‘left’}, default None

Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

So, when you resample daily, for 2001-01-05, the last value is 7, as the interval is [2001-01-04,2001-01-05)

When you resample weekly, the last value is 9, as the interval is (2000-12-30 ,2000-01-05]

Closing then

Pandas Inconsistent behaviour in resample between daily and weekly

4076 is about extra bins when resampling e.g. you downsample to weekly from daily data, and in some circumstances you get an extra bin at the end which does not correspond to any of the data.