import pytz
from pandas import Timestamp, Series
uniform_tz = Series({Timestamp("2019-05-01", tz=pytz.UTC): 1})
multi_tz = Series(
{
Timestamp("2019-05-01 01:00:00+0100", tz=pytz.timezone("Europe/London")): 2,
Timestamp("2019-05-02", tz=pytz.UTC): 3,
}
)
multi_tz.combine_first(uniform_tz) # works fine
uniform_tz.combine_first(multi_tz) # raises ValueError
Problem description
left.combine_first(right)
unexpectedly raises a ValueError
if:
left
has a timezone-aware datetime index, with the same timezone throughout the index.right
has a timezone-aware datetime index, with a mix of different timezones.
The traceback is as follows:
Traceback (most recent call last):
File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1861, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data)
File "pandas/_libs/tslibs/conversion.pyx", line 185, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas_test.py", line 57, in <module>
uniform_tz.combine_first(multi_tz) # raises ValueError
File "~/.venv/lib/python3.7/site-packages/pandas/core/series.py", line 2605, in combine_first
new_index = self.index.union(other.index)
File "~/.venv/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 484, in union
other = DatetimeIndex(other)
File "~/.venv/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 303, in __new__
int_as_wall_time=True)
File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 376, in _from_sequence
ambiguous=ambiguous, int_as_wall_time=int_as_wall_time)
File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1757, in sequence_to_dt64ns
data, dayfirst=dayfirst, yearfirst=yearfirst)
File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1866, in objects_to_datetime64ns
raise e
File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1857, in objects_to_datetime64ns
require_iso8601=require_iso8601
File "pandas/_libs/tslib.pyx", line 460, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 537, in pandas._libs.tslib.array_to_datetime
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
If the same arguments are swapped, i.e. right.combine_first(left)
, no error is raised, and the output is as expected (timestamps which are equal but in different timezones are identified, with the left argument's timezone propagating into the output).
The error also doesn't occur at all with the same setup on DataFrames.
It doesn't seem to me that there's any semantic reason that the operation should fail, so this appears to be a bug.
Output of pd.show_versions()
Comment From: gfyoung
cc @jreback @mroeschke
Comment From: mroeschke
While this works if both arguments are DataFrame
s, the resulting index gets converted to UTC:
In [3]: pd.DataFrame(uniform_tz).combine_first(pd.DataFrame(multi_tz))
Out[3]:
0
2019-05-01 00:00:00+00:00 1.0
2019-05-02 00:00:00+00:00 3.0
The main issues is in DatetimeIndex.union
, and this patch makes the Series case pass if anyone would like to put up a PR:
diff --git a/pandas/core/indexes/datetimes.py b/pandas/core/indexes/datetimes.py
index 6b9806f4d..b91b936ad 100644
--- a/pandas/core/indexes/datetimes.py
+++ b/pandas/core/indexes/datetimes.py
@@ -489,6 +489,9 @@ class DatetimeIndex(DatetimeIndexOpsMixin, Int64Index, DatetimeDelegateMixin):
other = DatetimeIndex(other)
except TypeError:
pass
+ except ValueError:
+ from pandas import to_datetime
+ other = to_datetime(other, utc=True)
this, other = self._maybe_utc_convert(other)
Comment From: jbrockmendel
This looks right on main, casts to object instead of UTC. could use a test