Given a timezone-aware Timestamp:
foo = pd.Timestamp('2016-10-30 00:00:00', tz=pytz.timezone('Europe/Helsinki'))
(Please note that 2016-10-30
is a 25-hour day, due to a DST change. This day the hour changes from +0300
to +0200
)
I'm trying to get the next day. If I understand correctly the DateOffset behaviour, all these lines should be equivalent:
foo + pd.tseries.frequencies.to_offset('D')
foo + pd.tseries.offsets.Day()
foo + pd.DateOffset()
foo + pd.DateOffset(1)
foo + pd.DateOffset(days=1)
But the first four return a wrong date, presumably because they're not adding a day, but 24 hours:
Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')
And for some reason, the last one returns the correct date:
Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki')
Output of pd.show_versions()
Comment From: jreback
xref to https://github.com/pandas-dev/pandas/issues/8774 and #7825
the first 4 are equivalent to Day, so what you are asking is why are these different.
In [44]: foo + pd.DateOffset(days=1)
Out[44]: Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki')
In [45]: foo + pd.offsets.Day()
Out[45]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')
Day is a sub-class of Tick which like Minute, Second, is applicable to the UTC time directly (IOW these are converted to UTC added, then reconverted to local). So D is effectively 24 hours and not days.
While other offsets e.g. Month and such. seek to preserve the DST semantics exactly.
DateOffset(days=1)
works like Month and such and respects DST.
So this is a bit confusing. I don't really recall why it was structured like this. You are welcome to delve into the linked issues and code and see if you can write up a better explanation (which maybe we want to add to the docs).
Comment From: tapia
The problem is that I need a way to write a string offset ('D'
, '10m'
, whatever) that is timezone aware. If I'm going to add a day, I need to add an actual day, not 24 hours. How can I achieve this using to_offset
?
I can't use DateOffset to do this, because my code receives the offset string as a parameter, and writing a parser that "overrides" the Pandas parser looks like a very bad idea.
Comment From: gfyoung
I'm not sure if you can, but at the very least, what you can do is check for "D" in the string parameter and use pd.DateOffset(days=n)
as the workaround. Does that work for you?
Comment From: tapia
Yes, of course, that's what I ended doing. But, regardless of how I workaround the problem, I still think this is a bug in Pandas.
Thank you guys for your help :-)
Comment From: jreback
If you want to have a detailed look inside the code/tests for this would be great. I don't remember exactly why this is the way it is.
Comment From: jreback
consolidated to #20633