Pandas Concat with incomplete, timezone aware datetime64[ns, tz] arrays fails with TypeError

This doesn't work:

a = pd.DataFrame(data=np.random.rand(3))
time = pd.date_range('2000-01-01', tz='UTC', periods=3, freq='10ms', name='time')
a['time'] = time
b = pd.DataFrame(data=np.random.rand(3))
c = pd.concat([a, b])

It produces a type error:

/usr/lib/python3.6/site-packages/pandas/core/indexes/api.py:87: RuntimeWarning: '<' not supported between instances of 'str' and 'int', sort order is undefined for incomparable objects
  result = result.union(other)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-f0e01a86e895> in <module>()
----> 1 c = pd.concat([a, b])

/usr/lib/python3.6/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    211                        verify_integrity=verify_integrity,
    212                        copy=copy)
--> 213     return op.get_result()
    214 
    215 

/usr/lib/python3.6/site-packages/pandas/core/reshape/concat.py in get_result(self)
    406             new_data = concatenate_block_managers(
    407                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 408                 copy=self.copy)
    409             if not self.copy:
    410                 new_data._consolidate_inplace()

/usr/lib/python3.6/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   5196         else:
   5197             b = make_block(
-> 5198                 concatenate_join_units(join_units, concat_axis, copy=copy),
   5199                 placement=placement)
   5200         blocks.append(b)

/usr/lib/python3.6/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
   5325     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   5326                                          upcasted_na=upcasted_na)
-> 5327                  for ju in join_units]
   5328 
   5329     if len(to_concat) == 1:

/usr/lib/python3.6/site-packages/pandas/core/internals.py in <listcomp>(.0)
   5325     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   5326                                          upcasted_na=upcasted_na)
-> 5327                  for ju in join_units]
   5328 
   5329     if len(to_concat) == 1:

/usr/lib/python3.6/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
   5596                     pass
   5597                 else:
-> 5598                     missing_arr = np.empty(self.shape, dtype=empty_dtype)
   5599                     missing_arr.fill(fill_value)
   5600                     return missing_arr

TypeError: data type not understood
> /usr/lib/python3.6/site-packages/pandas/core/internals.py(5598)get_reindexed_values()
   5596                     pass
   5597                 else:
-> 5598                     missing_arr = np.empty(self.shape, dtype=empty_dtype)
   5599                     missing_arr.fill(fill_value)
   5600                     return missing_arr

Problem description

When concatenating dataframes, of which one contains a column with datetime64[ns, UTC], the process fails. This currently breaks my workflow, and I'd like to stick with timezone aware times...

It works with plain datetime64[ns].:

a['time'] = pd.date_range('2000-01-01', tz=None, periods=3, freq='10ms', name='time'
pd.concat([a, b])

and produces the expected output:

/usr/lib/python3.6/site-packages/pandas/core/indexes/api.py:87: RuntimeWarning: '<' not supported between instances of 'str' and 'int', sort order is undefined for incomparable objects
  result = result.union(other)
Out[35]: 
          0                    time
0  0.071325 2000-01-01 00:00:00.000
1  0.485844 2000-01-01 00:00:00.010
2  0.247131 2000-01-01 00:00:00.020
0  0.595540                     NaT
1  0.609389                     NaT
2  0.850834                     NaT

Expected Output

Similar to the result with plain numpy datetime64, pd.concat() should simply fill in the missing time values with NaT.

Thanks and regards

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.13.12-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: de_DE.utf8 LOCALE: de_DE.UTF-8 pandas: 0.21.0 pytest: 3.3.0 pip: 9.0.1 setuptools: 38.2.3 Cython: 0.27.3 numpy: 1.13.3 scipy: 1.0.0 pyarrow: None xarray: 0.10.0 IPython: 6.2.1 sphinx: 1.6.5 patsy: None dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.1.0 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 sqlalchemy: 1.1.15 pymysql: None psycopg2: None jinja2: 2.10 s3fs: 0.0.9 fastparquet: None pandas_gbq: None pandas_datareader: 0.5.0

Comment From: jreback

duplicate of #12396

Comment From: ulijh

Oh, sry for this beeing a dublicate. I must have used the wrong keywords when searching...

Pandas Concat with incomplete, timezone aware datetime64[ns, tz] arrays fails with TypeError

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`