From https://github.com/pandas-dev/pandas/pull/42494#discussion_r771594058
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
# Nominal:
from datetime import datetime
res = pd.to_datetime(["2020-01-01 01:00 -01:00", datetime(2020, 1, 1, 3, 0)])
print(res)
# yields: DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'],
# dtype='datetime64[ns, pytz.FixedOffset(-60)]', freq=None)
# Issue if the timezone-naive elements are strings
res2 = pd.to_datetime(["2020-01-01 01:00 -01:00", "2020-01-01 03:00"])
print(res2)
# yields: Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object')
Issue Description
When to_datetime(utc=False)
receives a mix of timezone-aware and timezone-naive inputs, it converts them to a timezone-aware DatetimeIndex
(see example, part 1).
However this seems to happen only if if the timezone-naive elements are datetime.datetime
, not if they are strings (see example, part 2).
Expected Behavior
Both outputs should be consistent. Either both return an Index
with object dtype and containing datetime.datetime objects, OR both return timezone-aware DatetimeIndex
.
Installed Versions
Comment From: smarie
Note that the docstring examples from #42494 might need to be updated if this is fixed.
Comment From: yiwiz-sai
seems I also got this bug:
pip3 freeze |grep -E "pytz|pandas|numpy"
numpy==1.21.6
pandas==1.4.2
pytz==2022.1
test code
#!/usr/bin/env python
# -*- coding: utf-8 -*
import pandas as pd
t1 = '2016-03-11 16:58:00-05:00'
t2 = '2016-03-14 07:08:00-04:00'
t3 = '2022-05-27 16:59:00-04:00'
df1 = pd.Series([t1,t2]).to_frame()
df1.columns = ['a']
df1.index = pd.to_datetime(df1['a'])
print(df1.index)
df2 = pd.Series([t2,t3]).to_frame()
df2.columns = ['a']
df2.index = pd.to_datetime(df2['a'])
print(df2.index)
=============
Index([2016-03-11 16:58:00-05:00, 2016-03-14 07:08:00-04:00], dtype='object', name='a')
DatetimeIndex(['2016-03-14 07:08:00-04:00', '2022-05-27 16:59:00-04:00'], dtype='datetime64[ns, pytz.FixedOffset(-240)]', name='a', freq=None)
Comment From: smarie
@yiwiz-sai : this is a different problem, currently documented as a limitation. https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#to-datetime-tz-examples
I still believe that the whole feature should be refactored to drastically reduce all these limitations, however this may have a lot of legacy compatibility effects to handle.
Comment From: yiwiz-sai
@yiwiz-sai : this is a different problem, currently documented as a limitation. https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#to-datetime-tz-examples
I still believe that the whole feature should be refactored to drastically reduce all these limitations, however this may have a lot of legacy compatibility effects to handle.
Thank you for the information !
I totally agree that should be refactored, there are lots of open issues about it.
Comment From: MarcoGorelli
Since PDEP4,
res2 = pd.to_datetime(["2020-01-01 01:00 -01:00", "2020-01-01 03:00"])
will error with
ValueError: time data '2020-01-01 03:00' does not match format '%Y-%m-%d %H:%M %z' (match)
Closing then, but thanks for the report!