Hello the Pandas team,
I ran this morning into some nasty bug with Pandas 18.1. I was not able to find whether you are aware of it or not. Here it is
idx=[pd.to_datetime('2012-02-27 14:23:00') , pd.to_datetime('2012-08-27 14:33:00'), pd.to_datetime('2012-02-27 14:23:00')]
test=pd.DataFrame({'A':['one','two', 'three']},index=idx)
test['alt']=test.index.tz_localize('UTC').tz_convert('US/Eastern').tz_localize(None)
Out[9]:
A alt
2012-02-27 14:23:00 one 2012-02-27 09:23:00
2012-08-27 14:33:00 two 2012-08-27 10:33:00
2012-02-27 14:23:00 three 2012-02-27 10:23:00
where the third column is obviously wrong. What have I done wrong here? Are indices not meant to work with duplicate values?
Thanks!
Comment From: jreback
there are a couple of bug fixes w.r.t. DST in 0.19.0; this RC has been out for a week or so https://github.com/pydata/pandas/releases/tag/v0.19.0rc1
In [23]: test
Out[23]:
A
2012-02-27 14:23:00 one
2012-08-27 14:33:00 two
2012-02-27 14:23:00 three
In [24]: test['alt']=test.index.tz_localize('UTC').tz_convert('US/Eastern').tz_localize(None)
In [25]: test
Out[25]:
A alt
2012-02-27 14:23:00 one 2012-02-27 09:23:00
2012-08-27 14:33:00 two 2012-08-27 10:33:00
2012-02-27 14:23:00 three 2012-02-27 09:23:00
Comment From: randomgambit
got it. my workaround is to use apply and tz_localize / tz_convert on a column that contains the timestamp (instead of working with the index). That seems to be working. Does that make sense?