import pandas

df = pandas.DataFrame({'a': [pandas.Timestamp('20200309T120000.000000-0400'), pandas.Timestamp('20200309T130000.000000-0400')]})
df.astype({'a': pandas.api.types.DatetimeTZDtype(unit='ns', tz='UTC')})
                          a
0 2020-03-09 16:00:00+00:00
1 2020-03-09 17:00:00+00:00
df = pandas.DataFrame({'a': [pandas.Timestamp('20200309T120000.000000-0400'), pandas.Timestamp('20200309T130000.000000-0500')]})
df.astype({'a': pandas.api.types.DatetimeTZDtype(unit='ns', tz='UTC')})

ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

Similarly, something like df['a'].dt.tz_convert(tz='utc') also returns the same error

Problem description

Calling pandas.DataFrame.astype with a column of mixed timezone aware Timestamps fails to convert to UTC

Expected Output

I would expect this to return the same result as df.applymap(lambda x: x.tz_convert(tz='utc'))

0 2020-03-09 16:00:00+00:00
1 2020-03-09 18:00:00+00:00

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.8.1.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-28-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.0.0.post20200309 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 7.12.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Comment From: jbrockmendel

closed by #49454