Code Sample
result = pd.Series(1., pd.date_range('20170101','20181231',freq='D')).resample('B').last().to_period()
print result.index[0]
# Period('2016-12-30', 'B')
pd.Period('20170101', freq='B')
# Period('2017-01-02', 'B')
Problem description
when i have daily data that spans weekends (e.g., 1/1/2017 which is a Sunday) and try to get the last available value for a BusinessDay, the data falls back to the Friday period. This is inconsistent with the timestamp of Sunday belonging to the Monday BusinessDay period (e.g., pd.Period('20170101', 'B')
goes to 1/2/2017).
Expected Output
I would expect that the result.index[0]
above would return Period('2017-01-02', 'B')
Output of pd.show_versions()
Comment From: erbian
possibly related to https://github.com/pandas-dev/pandas/issues/10575
Comment From: jreback
duplicate of this issue: https://github.com/pandas-dev/pandas/issues/11123
Comment From: jreback
this is basically a convention. Though I guess it could be regarded as a bug as well. See the discussion on #11123. (and the xref issue you pointed). If you have some thoughts, pls share.
Comment From: erbian
@jreback I understand that the choice of including Fri-Sat-Sun in Friday period is convention (though as others have pointed out, probably not ideal for time-series analysis given look-forward issues). I think it is then inconsistent that creating a period from a timestamp returns a period for which the timestamp is not included. for example:
pd.Period(pd.datetime(2017,1,1), 'B').start_time
# Timestamp('2017-01-02 00:00:00')
does the issue i am raising make sense?
Comment From: jreback
In [3]: pd.datetime(2017,1,1) + pd.offsets.BusinessDay()
Out[3]: Timestamp('2017-01-02 00:00:00')
this is just convention (and documented).
In the other issue I suggested a new frequency as B
is essentially look-ahead, maybe a look-behind one.
this looks reasonable actually. I think using .to_period()
itself is the issue.
In [27]: r = pd.Series(1., pd.date_range('20170101','20170131',freq='D')).resample('B').asfreq()
In [28]: pd.concat([r, r.index.to_series().dt.weekday_name], axis=1)
Out[28]:
0 1
2016-12-30 NaN Friday
2017-01-02 1.0 Monday
2017-01-03 1.0 Tuesday
2017-01-04 1.0 Wednesday
2017-01-05 1.0 Thursday
2017-01-06 1.0 Friday
2017-01-09 1.0 Monday
2017-01-10 1.0 Tuesday
2017-01-11 1.0 Wednesday
2017-01-12 1.0 Thursday
2017-01-13 1.0 Friday
2017-01-16 1.0 Monday
2017-01-17 1.0 Tuesday
2017-01-18 1.0 Wednesday
2017-01-19 1.0 Thursday
2017-01-20 1.0 Friday
2017-01-23 1.0 Monday
2017-01-24 1.0 Tuesday
2017-01-25 1.0 Wednesday
2017-01-26 1.0 Thursday
2017-01-27 1.0 Friday
2017-01-30 1.0 Monday
2017-01-31 1.0 Tuesday
Comment From: jreback
cc @chris-b1