It seems like Series.dt timezone properties to not interact well with series.apply(lambda x : x.strftime("..")) leading to some odd behaviour. Here is a script demonstrating the problem:
import pandas as pd
import logging
import pytz
from pytz import timezone
date_format_string = "%a %b %d %Y %H:%M"
def convert_datetime(x) :
return x.strftime(date_format_string)
def longer_convert(x, tz = timezone('Europe/London')) :
return convert_datetime(pytz.utc.localize(x).astimezone(tz))
if __name__=="__main__" :
rng = pd.date_range('6/6/2011', periods=6, freq='H')
rng2 = pd.date_range('6/6/2011', periods=6, freq='D')
series = pd.Series(data=rng, index=rng2)
series.name = "Original"
converted_series = series.dt.tz_localize('UTC').dt.tz_convert('Europe/London')
converted_series.name = "Converted"
formatted_series = converted_series.apply(convert_datetime)
formatted_series.name = "Formatted"
longer_converted = series.apply(longer_convert)
longer_converted.name = "Using pytz"
df=pd.concat([series,converted_series,formatted_series, longer_converted], axis=1)
print df
Which yields the output
Original Converted
2011-06-06 2011-06-06 00:00:00 2011-06-06 01:00:00+01:00
2011-06-07 2011-06-06 01:00:00 2011-06-06 02:00:00+01:00
2011-06-08 2011-06-06 02:00:00 2011-06-06 03:00:00+01:00
2011-06-09 2011-06-06 03:00:00 2011-06-06 04:00:00+01:00
2011-06-10 2011-06-06 04:00:00 2011-06-06 05:00:00+01:00
2011-06-11 2011-06-06 05:00:00 2011-06-06 06:00:00+01:00
Formatted Using pytz
2011-06-06 Mon Jun 06 2011 00:00 Mon Jun 06 2011 01:00
2011-06-07 Mon Jun 06 2011 01:00 Mon Jun 06 2011 02:00
2011-06-08 Mon Jun 06 2011 02:00 Mon Jun 06 2011 03:00
2011-06-09 Mon Jun 06 2011 03:00 Mon Jun 06 2011 04:00
2011-06-10 Mon Jun 06 2011 04:00 Mon Jun 06 2011 05:00
2011-06-11 Mon Jun 06 2011 05:00 Mon Jun 06 2011 06:00
In which the application of the format string using apply caused the times to revert to UTC (Compare the hours in Formatted vs Converted), whereas I expected to get output similar to the "Using pytz" column, where I have displayed the time in local time with no timezone adjustment.
Is this expected behaviour? Can I format to a string directly through Series.dt?
Comment From: jreback
pls pd.show_versions()
Comment From: jreback
this is a dupe of #11757 not yet fixed. pull-requests are welcome.
Comment From: phil20686
pd.show_versions()
Comment From: phil20686
pd.show_versions():
INSTALLED VERSIONS
commit: None python: 2.7.6.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None
pandas: 0.18.0 nose: 1.3.4 pip: 8.1.1 setuptools: 20.2.2 Cython: None numpy: 1.9.2 scipy: 0.17.0 statsmodels: 0.6.1 xarray: None IPython: 2.3.0 sphinx: None patsy: 0.4.1 dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.4.2 openpyxl: 2.3.1 xlrd: 0.9.4 xlwt: None xlsxwriter: 0.8.4 lxml: 3.5.0 bs4: 4.4.1 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.7.3 boto: None
converted_series.dt.strftime("...") works correctly so I can use that. Didn't really think of it till just now.