Code Sample
>>> pd.to_datetime('2017-01-01T15:00:00-05:00')
... Timestamp('2017-01-01 20:00:00') # This is a timezone-naive Timestamp, but was converted to UTC
Problem description
Given a timezone-aware string (e.g., in ISO8601 format), pd.to_datetime
converts the string to UTC time, but doesn't set the tzinfo
to UTC.
This makes it difficult to differentiate between string that are truly timezone-naive strings (e.g., '2017-01-01') and timezone-aware strings (e.g., '2017-01-01T15:00:00-05:00').
Expected Output
Option 1: Convert the string to UTC (current behavior) and also set the timezone to UTC:
>>> pd.to_datetime('2017-01-01T15:00:00-05:00')
... Timestamp('2017-01-01 20:00:00+0000', tz='UTC')
Option 2: Don't convert to UTC and retain timezone. This is the behavior of dateutil.parser.parse
:
>>> pd.to_datetime(parser.parse('2017-01-01T15:00:00-0500'))
... Timestamp('2017-01-01 15:00:00-0500', tz='tzoffset(None, -18000)')
Output of pd.show_versions()
Comment From: jreback
There is an open issue to add a tz=
kwargs to to_datetime
, see xref https://github.com/pandas-dev/pandas/issues/13712, to make this compatible with Timestamp
.
You can simply specify utc=True
to get the result you want.
# default
In [1]: pd.to_datetime('2017-01-01T15:00:00-05:00')
Out[1]: Timestamp('2017-01-01 20:00:00')
# you can specify if you want utc output
In [3]: pd.to_datetime('2017-01-01T15:00:00-05:00', utc=True)
Out[3]: Timestamp('2017-01-01 20:00:00+0000', tz='UTC')
# Timestamp infers the tz by default
In [2]: Timestamp('2017-01-01T15:00:00-05:00')
Out[2]: Timestamp('2017-01-01 15:00:00-0500', tz='pytz.FixedOffset(-300)')
Normally you parse strings directly to naive (or UTC) as actually constructing the tz is not generally useful (IOW it must be a fixed time offset, rather than a common zone like US/Eastern
). Then you can convert / localize into the viewing zone.
So your option 1 is [3]. option 2 I suppose could be done, but is generally non-performant as you can easily end up with mixed zones.
Comment From: jreback
closing as this is essentially a duplicate of the request in #13721 having a tz=
kw removes ambiguity of what the user wants (this would also remove the utc=
kw).
Comment From: ryanjdillon
I have found that the following gives me naive timestamps:
pandas.to_datetime(df['times'], utc=True)
Whereas the following gives me tz aware timestamps:
pandas.to_datetime(df['times'].values, utc=True)
What might be going on here?
Comment From: worthy7
Doesn't seem to care at all about UTC, it's always tz-aware.