This doesn't work:
a = pd.DataFrame(data=np.random.rand(3))
time = pd.date_range('2000-01-01', tz='UTC', periods=3, freq='10ms', name='time')
a['time'] = time
b = pd.DataFrame(data=np.random.rand(3))
c = pd.concat([a, b])
It produces a type error:
/usr/lib/python3.6/site-packages/pandas/core/indexes/api.py:87: RuntimeWarning: '<' not supported between instances of 'str' and 'int', sort order is undefined for incomparable objects
result = result.union(other)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-f0e01a86e895> in <module>()
----> 1 c = pd.concat([a, b])
/usr/lib/python3.6/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
211 verify_integrity=verify_integrity,
212 copy=copy)
--> 213 return op.get_result()
214
215
/usr/lib/python3.6/site-packages/pandas/core/reshape/concat.py in get_result(self)
406 new_data = concatenate_block_managers(
407 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 408 copy=self.copy)
409 if not self.copy:
410 new_data._consolidate_inplace()
/usr/lib/python3.6/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
5196 else:
5197 b = make_block(
-> 5198 concatenate_join_units(join_units, concat_axis, copy=copy),
5199 placement=placement)
5200 blocks.append(b)
/usr/lib/python3.6/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
5325 to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
5326 upcasted_na=upcasted_na)
-> 5327 for ju in join_units]
5328
5329 if len(to_concat) == 1:
/usr/lib/python3.6/site-packages/pandas/core/internals.py in <listcomp>(.0)
5325 to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
5326 upcasted_na=upcasted_na)
-> 5327 for ju in join_units]
5328
5329 if len(to_concat) == 1:
/usr/lib/python3.6/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
5596 pass
5597 else:
-> 5598 missing_arr = np.empty(self.shape, dtype=empty_dtype)
5599 missing_arr.fill(fill_value)
5600 return missing_arr
TypeError: data type not understood
> /usr/lib/python3.6/site-packages/pandas/core/internals.py(5598)get_reindexed_values()
5596 pass
5597 else:
-> 5598 missing_arr = np.empty(self.shape, dtype=empty_dtype)
5599 missing_arr.fill(fill_value)
5600 return missing_arr
Problem description
When concatenating dataframes, of which one contains a column with datetime64[ns, UTC]
, the process fails. This currently breaks my workflow, and I'd like to stick with timezone aware times...
It works with plain datetime64[ns]
.:
a['time'] = pd.date_range('2000-01-01', tz=None, periods=3, freq='10ms', name='time'
pd.concat([a, b])
and produces the expected output:
/usr/lib/python3.6/site-packages/pandas/core/indexes/api.py:87: RuntimeWarning: '<' not supported between instances of 'str' and 'int', sort order is undefined for incomparable objects
result = result.union(other)
Out[35]:
0 time
0 0.071325 2000-01-01 00:00:00.000
1 0.485844 2000-01-01 00:00:00.010
2 0.247131 2000-01-01 00:00:00.020
0 0.595540 NaT
1 0.609389 NaT
2 0.850834 NaT
Expected Output
Similar to the result with plain numpy datetime64
, pd.concat()
should simply fill in the missing time values with NaT
.
Thanks and regards
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.12-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_DE.utf8
LOCALE: de_DE.UTF-8
pandas: 0.21.0
pytest: 3.3.0
pip: 9.0.1
setuptools: 38.2.3
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.6.5
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: 1.1.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.0.9
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0
Comment From: jreback
duplicate of #12396
Comment From: ulijh
Oh, sry for this beeing a dublicate. I must have used the wrong keywords when searching...