Using the .where operation on a localized DataFrame Column may raise an exception due to a failure to tz_localize the result of the operation in DatetimeTZBlock
Code Sample, a copy-pastable example if possible
import pandas
import pytz
TZ = pytz.timezone("Europe/Brussels")
# TZ = None # <== with TZ = None, there is no exception
d = datetime(2012, 3, 24, 16, tzinfo=TZ)
# d = pytz.timezone(TZ).localize(d)
idx = pandas.date_range("2012/03/24", "2012/03/26", tz=TZ, freq='H')
df = pandas.Series(None, index=idx, name="d").to_frame()
df["d"] = idx
print("following line is OK")
df["d"].where(df["d"] > d, d)
df["d"] = idx[::-1] # tz_localize will fail
print("following line raises exception")
df["d"].where(df["d"] > d, d)
Actual Output
following line is OK
following line raises exception
==> raises exception "pytz.exceptions.NonExistentTimeError: 2012-03-25 02:00:00" / "TypeError: Could not operate [1332607320000000000] with block values [2012-03-25 02:00:00]"
Expected Output
following line is OK
following line raises exception
[no exception raised]
output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.2.final.0 python-bits: 32 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None
pandas: 0.18.1 nose: None pip: 8.1.2 setuptools: 23.0.0 Cython: None numpy: 1.11.0 scipy: 0.17.1 statsmodels: None xarray: None IPython: 4.2.0 sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: 2.6.1 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: None xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: None html5lib: None httplib2: 0.9.2 apiclient: None sqlalchemy: 1.0.12 pymysql: None psycopg2: 2.6.2 (dt dec pq3 ext) jinja2: 2.8 boto: None pandas_datareader: None
Comment From: jreback
Out[3]:
2012-03-24 00:00:00+01:00 2012-03-26 00:00:00+02:00
2012-03-24 01:00:00+01:00 2012-03-25 23:00:00+02:00
2012-03-24 02:00:00+01:00 2012-03-25 22:00:00+02:00
2012-03-24 03:00:00+01:00 2012-03-25 21:00:00+02:00
2012-03-24 04:00:00+01:00 2012-03-25 20:00:00+02:00
2012-03-24 05:00:00+01:00 2012-03-25 19:00:00+02:00
...
2012-03-25 19:00:00+02:00 2012-03-24 16:42:00+01:00
2012-03-25 20:00:00+02:00 2012-03-24 16:42:00+01:00
2012-03-25 21:00:00+02:00 2012-03-24 16:42:00+01:00
2012-03-25 22:00:00+02:00 2012-03-24 16:42:00+01:00
2012-03-25 23:00:00+02:00 2012-03-24 16:42:00+01:00
2012-03-26 00:00:00+02:00 2012-03-24 16:42:00+01:00
Freq: H, Name: d, dtype: datetime64[ns, Europe/Brussels]
I think fixed by #13583 (though several PR's did touch upon this).