Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
In [2]: pd.Timestamp("2016-01-01", tz="Europe/Berlin") - pd.Timestamp("now", tz="UTC")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-691a8df26ecd> in <module>()
----> 1 pd.Timestamp("2016-01-01", tz="Europe/Berlin") - pd.Timestamp("now", tz="UTC")
pandas\tslib.pyx in pandas.tslib._Timestamp.__sub__ (pandas\tslib.c:23697)()
TypeError: Timestamp subtraction must have the same timezones or no timezones
Problem description
If both timestamps have a timezone specified, the result of this operation is perfectly well-defined. It's quite surprising that I have to riddle my code with lhs.tz_convert("UTC") - rhs.tz_convert("UTC")
lines to get the difference of timestamps.
Expected Output
Timedelta('-393 days +06:29:07.057926')
Output of pd.show_versions()
Comment From: jreback
we could certainly make this work
generally though if someone is trying to subtract different time zones it's an error (they meant to convert)
but maybe that's not typical
Comment From: Vutsuak16
@jreback should it be implemented for 'addition operation' too? Because it will raise the same error while adding two timestamps of different timezones AND which side of the operator should be given preference here, i.e. to subtract we ultimately need them to be in same timezones. So should tz=Europe/Berlin should be converted to UTC or UTC should be converted to Europe/Berlin?
Comment From: jreback
addition of timestamps is not a meaningful operation
you will have to handle both sub and rsub both are converted to UTC before subtraction
Comment From: jreback
I am actually now a bit -0 on this as this is a touch too magical. This should be an explicit operation. In reality it doesn't come up that much, so making this a user-explicit operation should not be much of a burden.
But will leave the issue open for discussion / implementation.
Comment From: jreback
closing this, we are moving towards things being very explicit.
Comment From: filmor
I really don't see any magic here. Every non-naive Timestamp
specifies a unique point on a common time-axis, completely independent of the timezone being used, so their difference is unambiguously defined as the distance on that time-axis.
Contrary to what you said before, subtracting timestamps with different timezones /is/ a valid action that can happen in particular in library code, e.g. if I provide a function time_to_some_event(ts)
, where event
is given by some backend system's value (i.e. machine to machine communication, so usually UTC) when ts
comes from user code, so it will probably have a non-UTC timezone attached.
If you want to be explicit, what you should not allow is subtraction of naive timestamps, as this gives you essentially random values if the naive timestamps don't happen to be UTC.
Comment From: jreback
we raise for all comparisons between differing tz's (whether tz is UTC or naive or another zone).
this is complicated by the fact that we generally turn strings into naive timestamps, xref
xref #18435
subtracting in different timezones would be valid, but is just plain confusing, forcing folks to put things in the same timezone to subtract is not very burdensome and is much much more explicit.
Comment From: jorisvandenbossche
we raise for all comparisons between differing tz's (whether tz is UTC or naive or another zone).
That is not really true, for comparisons we allow this:
In [13]: pd.Timestamp('2016-01-01', tz='Europe/Brussels') > pd.Timestamp('2017-01-01', tz='UTC')
Out[13]: False
In [14]: pd.DatetimeIndex([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Timestamp('2017-01-01', tz='UTC')
Out[14]: array([False], dtype=bool)
In [15]: pd.Series([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Timestamp('2017-01-01', tz='UTC')
Out[15]:
0 False
dtype: bool
In [16]: pd.Series([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Series([pd.Timestamp('2017-01-01', tz='UTC')])
Out[16]:
0 False
dtype: bool
Comment From: jreback
see my comment in the linked issue these should all raise (on the list)
Comment From: jorisvandenbossche
these should all raise
I would say that is up for debate. We currently allow it, and I don't think there is anything ambiguous about what the result should be. Why breaking backwards compatibility to start erroring on this?
Comment From: filmor
Also, the actual issue in the linked comment is about tz-naive vs tz-aware timestamps. Of course neither comparison nor difference make sense if tz-naive timestamps are involved (even naive vs naive dubious, cf. pd.Timestamp("DST-Day 02:30") < pd.Timestamp("DST-Day 02:31")
).
Comment From: foolcage
They are just time..and should always contain tz info natively,and so they could compare.
Comment From: guyer
I know this is long closed, but if all of your Timestamps are in the same timezone, why are you squandering cycles on time zones at all? They're only relevant if you have different events in different time zones.