Consider this timeseries:
0 2016-11-06 01:19:54.000
1 2016-11-06 01:34:52.000
2 2016-11-06 01:35:36.000
3 2016-11-06 01:28:25.000
4 2016-11-06 01:59:32.000
5 2016-11-06 01:26:53.000
6 2016-11-06 01:52:44.000
7 2016-11-06 01:07:09.000
8 2016-11-06 01:35:03.000
9 2016-11-06 01:11:19.000
10 2016-11-06 01:31:28.000
11 2016-11-06 01:10:16.000
12 2016-11-06 01:04:55.000
13 2016-11-06 01:07:16.000
14 2016-11-06 01:09:35.000
15 2016-11-06 01:11:50.000
16 2016-11-06 01:15:03.000
17 2016-11-06 01:03:23.000
18 2016-11-06 01:20:27.000
19 2016-11-06 01:17:20.000
20 2016-11-06 01:14:44.000
21 2016-11-06 01:14:32.000
22 2016-11-06 01:20:23.000
23 2016-11-06 01:31:55.000
24 2016-11-06 01:32:45.000
25 2016-11-06 01:32:52.716
26 2016-11-06 01:00:55.000
27 2016-11-06 01:03:52.262
28 2016-11-06 01:19:59.000
29 2016-11-06 01:32:36.000
30 2016-11-06 01:48:56.000
31 2016-11-06 01:49:06.000
32 2016-11-06 01:29:11.170
33 2016-11-06 01:29:51.745
34 2016-11-06 01:30:09.560
35 2016-11-06 01:38:12.432
36 2016-11-06 01:45:23.631
37 2016-11-06 01:51:52.046
38 2016-11-06 01:52:36.318
39 2016-11-06 01:10:58.000
Name: ts, dtype: datetime64[ns]
When I try to do:
> temp.dt.tz_localize('US/Eastern', ambiguous='infer')
I get:
AmbiguousTimeError: There are 13 dst switches when there should only be 1.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.1.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.1
nose: 1.3.7
pip: 9.0.0
setuptools: 27.1.0.post20161104
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.2
pytz: 2016.7
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.2.0-b1
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.3
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.43.0
pandas_datareader: None
Comment From: jorisvandenbossche
Please provide a reproducible example (runnable code).
But, I think the error is clear: "AmbiguousTimeError: There are 13 dst switches when there should only be 1."
So it says that there are 13 DST switches, and this is because your time series is not sorted. tz_localize
can only infer the DST switch if it is a continuous time series, otherwise it is not possible to determine which time belongs to which offset.
You can use the ambiguous
keyword by passing an array or Series of boolean values, indicating for each entry in the series whether it is DST or not.
Comment From: amelio-vazquez-reina
Thank you @jorisvandenbossche . I may be misunderstanding what tz_localize
does and what it doesn't. I have timestamps without timezone information, but I happen to know they are in ET. I am trying to attach this info to it with tz_localize
. Is this not what one is supposed to use it for?
I also tried pre-sorting the timeseries above and still run into the same problem.
Comment From: amelio-vazquez-reina
Hmm, I think the problem is here:
> ts.apply(lambda x: pd.Timestamp(x, tz='US/Eastern'))
AmbiguousTimeError: Cannot infer dst time from Timestamp('2016-11-06 01:19:54'), try using the 'ambiguous' argument
which is basically around the time when we switched to daylight savings in ET.
Comment From: jreback
@amelio-vazquez-reina using apply is quite inefficient and you still need to disambiguate the dst
Comment From: amelio-vazquez-reina
Makes perfect sense. Thanks @jreback. I can close this now since I think this isn't really an bug/feature request. Thanks all though!
Comment From: jorisvandenbossche
@amelio-vazquez-reina tz_localize
sounds like the function you need (it starts from timezone naive data, and you then attach a timezone to it). The problem is that the times you post above are ambiguous, they can be two different points in time. Eg "2016-11-07 01:10:58" could be "2016-11-07 01:10:58-0500" or "2016-11-07 01:10:58-0400"
Comment From: jreback
we might want to enhance doc string and docs with an example ornate using the boolean flag to ambiguous
Comment From: jorisvandenbossche
There are some examples in the written docs: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#ambiguous-times-when-localizing
Comment From: jreback
that seems sufficient