From https://github.com/pandas-dev/pandas/pull/42494#discussion_r771594058

Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

# Nominal:
from datetime import datetime
res = pd.to_datetime(["2020-01-01 01:00 -01:00", datetime(2020, 1, 1, 3, 0)])
print(res)
# yields: DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'], 
#         dtype='datetime64[ns, pytz.FixedOffset(-60)]', freq=None)

# Issue if the timezone-naive elements are strings
res2 = pd.to_datetime(["2020-01-01 01:00 -01:00", "2020-01-01 03:00"])
print(res2)
# yields: Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object')

Issue Description

When to_datetime(utc=False) receives a mix of timezone-aware and timezone-naive inputs, it converts them to a timezone-aware DatetimeIndex (see example, part 1).

However this seems to happen only if if the timezone-naive elements are datetime.datetime, not if they are strings (see example, part 2).

Expected Behavior

Both outputs should be consistent. Either both return an Index with object dtype and containing datetime.datetime objects, OR both return timezone-aware DatetimeIndex.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 53107796887a5efb2a0d65e439d3308f3cc7c75f python : 3.8.11.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19043 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : fr_FR.cp1252 pandas : 1.4.0.dev0+1425.g5310779688 numpy : 1.20.3 pytz : 2021.1 dateutil : 2.8.2 pip : 21.0.1 setuptools : 58.0.4 Cython : 0.29.24 pytest : None hypothesis : None sphinx : 4.2.0 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.27.0 pandas_datareader: None bs4 : 4.10.0 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Comment From: smarie

Note that the docstring examples from #42494 might need to be updated if this is fixed.

Comment From: yiwiz-sai

seems I also got this bug:

pip3 freeze |grep -E "pytz|pandas|numpy"
numpy==1.21.6
pandas==1.4.2
pytz==2022.1

test code

#!/usr/bin/env python
# -*- coding: utf-8 -*
import pandas as pd
t1 = '2016-03-11 16:58:00-05:00'
t2 = '2016-03-14 07:08:00-04:00'
t3 = '2022-05-27 16:59:00-04:00'

df1 = pd.Series([t1,t2]).to_frame()
df1.columns = ['a']
df1.index = pd.to_datetime(df1['a'])
print(df1.index)

df2 = pd.Series([t2,t3]).to_frame()
df2.columns = ['a']
df2.index = pd.to_datetime(df2['a'])
print(df2.index)

=============
Index([2016-03-11 16:58:00-05:00, 2016-03-14 07:08:00-04:00], dtype='object', name='a')
DatetimeIndex(['2016-03-14 07:08:00-04:00', '2022-05-27 16:59:00-04:00'], dtype='datetime64[ns, pytz.FixedOffset(-240)]', name='a', freq=None)

Comment From: smarie

@yiwiz-sai : this is a different problem, currently documented as a limitation. https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#to-datetime-tz-examples

I still believe that the whole feature should be refactored to drastically reduce all these limitations, however this may have a lot of legacy compatibility effects to handle.

Comment From: yiwiz-sai

@yiwiz-sai : this is a different problem, currently documented as a limitation. https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#to-datetime-tz-examples

I still believe that the whole feature should be refactored to drastically reduce all these limitations, however this may have a lot of legacy compatibility effects to handle.

Thank you for the information !

I totally agree that should be refactored, there are lots of open issues about it.

Comment From: MarcoGorelli

Since PDEP4,

res2 = pd.to_datetime(["2020-01-01 01:00 -01:00", "2020-01-01 03:00"])

will error with

ValueError: time data '2020-01-01 03:00' does not match format '%Y-%m-%d %H:%M %z' (match)

Closing then, but thanks for the report!