import pytz
from pandas import Timestamp, Series

uniform_tz = Series({Timestamp("2019-05-01", tz=pytz.UTC): 1})

multi_tz = Series(
    {
        Timestamp("2019-05-01 01:00:00+0100", tz=pytz.timezone("Europe/London")): 2,
        Timestamp("2019-05-02", tz=pytz.UTC): 3,
    }
)

multi_tz.combine_first(uniform_tz)  # works fine
uniform_tz.combine_first(multi_tz)  # raises ValueError

Problem description

left.combine_first(right) unexpectedly raises a ValueError if:

  • left has a timezone-aware datetime index, with the same timezone throughout the index.
  • right has a timezone-aware datetime index, with a mix of different timezones.

The traceback is as follows:

Traceback (most recent call last):
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1861, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 185, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pandas_test.py", line 57, in <module>
    uniform_tz.combine_first(multi_tz)  # raises ValueError
  File "~/.venv/lib/python3.7/site-packages/pandas/core/series.py", line 2605, in combine_first
    new_index = self.index.union(other.index)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 484, in union
    other = DatetimeIndex(other)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 303, in __new__
    int_as_wall_time=True)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 376, in _from_sequence
    ambiguous=ambiguous, int_as_wall_time=int_as_wall_time)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1757, in sequence_to_dt64ns
    data, dayfirst=dayfirst, yearfirst=yearfirst)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1866, in objects_to_datetime64ns
    raise e
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1857, in objects_to_datetime64ns
    require_iso8601=require_iso8601
  File "pandas/_libs/tslib.pyx", line 460, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 537, in pandas._libs.tslib.array_to_datetime
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

If the same arguments are swapped, i.e. right.combine_first(left), no error is raised, and the output is as expected (timestamps which are equal but in different timezones are identified, with the left argument's timezone propagating into the output).

The error also doesn't occur at all with the same setup on DataFrames.

It doesn't seem to me that there's any semantic reason that the operation should fail, so this appears to be a bug.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 pandas: 0.24.2 pytest: 4.3.1 pip: 19.0.3 setuptools: 40.8.0 Cython: None numpy: 1.16.3 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.0 pytz: 2018.9 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: None bs4: None html5lib: None sqlalchemy: 1.3.3 pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

Comment From: gfyoung

cc @jreback @mroeschke

Comment From: mroeschke

While this works if both arguments are DataFrames, the resulting index gets converted to UTC:

In [3]: pd.DataFrame(uniform_tz).combine_first(pd.DataFrame(multi_tz))
Out[3]:
                             0
2019-05-01 00:00:00+00:00  1.0
2019-05-02 00:00:00+00:00  3.0

The main issues is in DatetimeIndex.union, and this patch makes the Series case pass if anyone would like to put up a PR:

diff --git a/pandas/core/indexes/datetimes.py b/pandas/core/indexes/datetimes.py
index 6b9806f4d..b91b936ad 100644
--- a/pandas/core/indexes/datetimes.py
+++ b/pandas/core/indexes/datetimes.py
@@ -489,6 +489,9 @@ class DatetimeIndex(DatetimeIndexOpsMixin, Int64Index, DatetimeDelegateMixin):
                 other = DatetimeIndex(other)
             except TypeError:
                 pass
+            except ValueError:
+                from pandas import to_datetime
+                other = to_datetime(other, utc=True)

         this, other = self._maybe_utc_convert(other)

Comment From: jbrockmendel

This looks right on main, casts to object instead of UTC. could use a test