Code Sample

>>> pd.to_datetime('2017-01-01T15:00:00-05:00')
... Timestamp('2017-01-01 20:00:00')  # This is a timezone-naive Timestamp, but was converted to UTC

Problem description

Given a timezone-aware string (e.g., in ISO8601 format), pd.to_datetime converts the string to UTC time, but doesn't set the tzinfo to UTC.

This makes it difficult to differentiate between string that are truly timezone-naive strings (e.g., '2017-01-01') and timezone-aware strings (e.g., '2017-01-01T15:00:00-05:00').

Expected Output

Option 1: Convert the string to UTC (current behavior) and also set the timezone to UTC:

>>> pd.to_datetime('2017-01-01T15:00:00-05:00')
... Timestamp('2017-01-01 20:00:00+0000', tz='UTC')

Option 2: Don't convert to UTC and retain timezone. This is the behavior of dateutil.parser.parse:

>>> pd.to_datetime(parser.parse('2017-01-01T15:00:00-0500'))
... Timestamp('2017-01-01 15:00:00-0500', tz='tzoffset(None, -18000)')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.1.35-pv-ts1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.20.3 pytest: 3.1.2 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.1 xarray: None IPython: 6.1.0 sphinx: 1.6.2 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: None xlsxwriter: 0.9.6 lxml: None bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.11 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

There is an open issue to add a tz= kwargs to to_datetime, see xref https://github.com/pandas-dev/pandas/issues/13712, to make this compatible with Timestamp.

You can simply specify utc=True to get the result you want.

# default
In [1]: pd.to_datetime('2017-01-01T15:00:00-05:00')
Out[1]: Timestamp('2017-01-01 20:00:00')

# you can specify if you want utc output
In [3]: pd.to_datetime('2017-01-01T15:00:00-05:00', utc=True)
Out[3]: Timestamp('2017-01-01 20:00:00+0000', tz='UTC')

# Timestamp infers the tz by default
In [2]: Timestamp('2017-01-01T15:00:00-05:00')
Out[2]: Timestamp('2017-01-01 15:00:00-0500', tz='pytz.FixedOffset(-300)')

Normally you parse strings directly to naive (or UTC) as actually constructing the tz is not generally useful (IOW it must be a fixed time offset, rather than a common zone like US/Eastern). Then you can convert / localize into the viewing zone.

So your option 1 is [3]. option 2 I suppose could be done, but is generally non-performant as you can easily end up with mixed zones.

Comment From: jreback

closing as this is essentially a duplicate of the request in #13721 having a tz= kw removes ambiguity of what the user wants (this would also remove the utc= kw).

Comment From: ryanjdillon

I have found that the following gives me naive timestamps:

pandas.to_datetime(df['times'], utc=True)

Whereas the following gives me tz aware timestamps:

pandas.to_datetime(df['times'].values, utc=True)

What might be going on here?

Comment From: worthy7

Pandas Passing a timezone-aware string to pd.to_datetime creates a timezone-naive Timestamp Doesn't seem to care at all about UTC, it's always tz-aware.