A small, complete example of the issue
pd.date_range returns a different number of dates when used with different dates and the same DateOffset. Note that both dates are month end dates and that 2008 is a leap year (potential source of bug).
In [1]:
month = pd.to_datetime('2007-02-28')
pd.date_range(month, month + pd.DateOffset(months=12), freq='M')
Out[1]:
DatetimeIndex(['2007-02-28', '2007-03-31', '2007-04-30', '2007-05-31',
'2007-06-30', '2007-07-31', '2007-08-31', '2007-09-30',
'2007-10-31', '2007-11-30', '2007-12-31', '2008-01-31'],
dtype='datetime64[ns]', freq='M')
In [2]:
month = pd.to_datetime('2007-01-31')
pd.date_range(month, month + pd.DateOffset(months=12), freq='M')
Out[2]:
DatetimeIndex(['2007-01-31', '2007-02-28', '2007-03-31', '2007-04-30',
'2007-05-31', '2007-06-30', '2007-07-31', '2007-08-31',
'2007-09-30', '2007-10-31', '2007-11-30', '2007-12-31',
'2008-01-31'],
However, we see that using the to-be deprecated datetools gives consistent behavior:
In [2]:
month = pd.to_datetime('2007-02-28')
pd.date_range(month, month + pd.datetools.MonthEnd(12), freq='M')
Out[2]:
DatetimeIndex(['2007-02-28', '2007-03-31', '2007-04-30', '2007-05-31',
'2007-06-30', '2007-07-31', '2007-08-31', '2007-09-30',
'2007-10-31', '2007-11-30', '2007-12-31', '2008-01-31',
'2008-02-29'], dtype='datetime64[ns]', freq='M')
In [3]:
month = pd.to_datetime('2007-01-31')
pd.date_range(month, month + pd.datetools.MonthEnd(12), freq='M')
Out[3]:
DatetimeIndex(['2007-01-31', '2007-02-28', '2007-03-31', '2007-04-30',
'2007-05-31', '2007-06-30', '2007-07-31', '2007-08-31',
'2007-09-30', '2007-10-31', '2007-11-30', '2007-12-31',
'2008-01-31'], dtype='datetime64[ns]', freq='M')
Expected Output
In [1]:
month = pd.to_datetime('2007-02-28')
pd.date_range(month, month + pd.DateOffset(months=12), freq='M')
Out[2]:
DatetimeIndex(['2007-02-28', '2007-03-31', '2007-04-30', '2007-05-31',
'2007-06-30', '2007-07-31', '2007-08-31', '2007-09-30',
'2007-10-31', '2007-11-30', '2007-12-31', '2008-01-31',
'2008-02-29'], dtype='datetime64[ns]', freq='M')
Output of pd.show_versions()
Comment From: chris-b1
Thank you for the report. First, note that only the datetools
namespace is being deprecated, you can still use month-end, just via pd.offsets.MonthEnd(12)
I don't think this is a bug, just a leap-year issue. A DateOffset simply rolls a date forward the specified number of months, trying to land on the same day of month, if possible, otherwise snapping to the end of month (these are also semantics of relativedelta
from the dateutil
module)
In [11]: pd.to_datetime('2007-02-28') + pd.DateOffset(months=12)
Out[11]: Timestamp('2008-02-28 00:00:00') # not a month end
In [13]: from dateutil import relativedelta
In [14]: from datetime import datetime
In [15]: datetime(2007,2,28) + relativedelta.relativedelta(months=12)
Out[15]: datetime.datetime(2008, 2, 28, 0, 0)
If you want to definitely snap to a month-end (including handling leap years), the MonthEnd
offset is what you want.