import pandas as pd
ts1 = pd.Timestamp('2015-07-01 10:00:00')
ts2 = pd.Timestamp('2015-07-01 11:00:00')
df1 = pd.DataFrame([[0.1],[0.2]], index=[ts1, ts2], columns=['a'])
print(df1.index.dtype)
df1 = df1.reset_index(drop=False)
print(df1['index'].dtype)
df2 = pd.DataFrame([[0.1],[0.2]], index=[ts1, ts2], columns=['a'])
df2.index = df2.index.tz_localize('Europe/London')
print(df2.index.dtype)
df2 = df2.reset_index(drop=False)
print(df2['index'].dtype)
gives
datetime64[ns] datetime64[ns] datetime64[ns] object
While one would expect that the last two lines are identical.
Comment From: jorisvandenbossche
This is because, at this moment, series cannot hold datetimes with timezone information in a proper numpy dtype. Therefore the object dtype is used and the values are stored as pd.Timestamp objects (which can hold timezones). But this is being worked on (#10477), and the ability to store timezones info in series will be in pandas 0.17
Comment From: jorisvandenbossche
You can also see what I described by looking at the difference of the .values
for both:
In [16]: df1['index'].values
Out[16]:
array(['2015-07-01T12:00:00.000000000+0200',
'2015-07-01T13:00:00.000000000+0200'], dtype='datetime64[ns]')
In [17]: df2['index'].values
Out[17]:
array([Timestamp('2015-07-01 10:00:00+0100', tz='Europe/London'),
Timestamp('2015-07-01 11:00:00+0100', tz='Europe/London')], dtype=object)
Comment From: ghost
Thanks. See related issue: https://github.com/pydata/pandas/issues/10763