Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
import pytz
import datetime
# Timezones
BRUSSELS = pytz.timezone('Europe/Brussels')
UTC = pytz.UTC
# Create a list of timezone-aware datetime objects in UTC
# Interval is 0b0.0000011 days, to prevent float rounding issues
dtstart = datetime.datetime(2014, 3, 30, 0, 0, tzinfo=UTC)
interval = datetime.timedelta(minutes=33, seconds=45)
interval_days = 0.0234375 # 2025 / 86400 seconds
N = 8
dt_utc = pd.date_range(start=dtstart, freq=interval, periods=N)
dt_bxl = pd.DatetimeIndex.tz_convert(dt_utc, BRUSSELS).astype(datetime.datetime)
Problem description
This is some code from a test in matplotlib that used to work with Pandas 0.19.0, c.f. matplotlib/matplotlib#8579. I couldn't find anything in the release notes that indicated this change should have occurred.
I think it may be related to this change which prohibits any NumPy-dtype objects that are not also Pandas-dtype objects, and since np.dtype(datetime.datetime).kind == 'O' and datetime.datetime not in [object, np.object_, 'object', 'O'], this is not allowed.
Expected Output
N/A (no error)
Actual Output
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../conda/envs/mpl35/lib/python3.5/site-packages/pandas/core/indexes/datetimes.py", line 848, in astype
dtype = pandas_dtype(dtype)
File ".../conda/envs/mpl35/lib/python3.5/site-packages/pandas/core/dtypes/common.py", line 1909, in pandas_dtype
raise TypeError('dtype {0} not understood'.format(dtype))
TypeError: dtype <class 'datetime.datetime'> not understood
Output of pd.show_versions()
Comment From: jorisvandenbossche
Simpler reproducible example:
0.19.2:
In [36]: import datetime
In [37]: pd.DatetimeIndex(['2012-01-01']).astype(datetime.datetime)
Out[37]: Index([2012-01-01 00:00:00], dtype='object')
0.20.1:
In [34]: pd.DatetimeIndex(['2012-01-01']).astype(datetime.datetime)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-dd83d3b47780> in <module>()
----> 1 pd.DatetimeIndex(['2012-01-01']).astype(datetime.datetime)
/home/joris/scipy/pandas/pandas/core/indexes/datetimes.py in astype(self, dtype, copy)
846 @Appender(_index_shared_docs['astype'])
847 def astype(self, dtype, copy=True):
--> 848 dtype = pandas_dtype(dtype)
849 if is_object_dtype(dtype):
850 return self.asobject
/home/joris/scipy/pandas/pandas/core/dtypes/common.py in pandas_dtype(dtype)
1907 return npdtype
1908 elif npdtype.kind == 'O':
-> 1909 raise TypeError('dtype {0} not understood'.format(dtype))
1910
1911 return npdtype
TypeError: dtype <class 'datetime.datetime'> not understood
Matplotlib can use pd.DatetimeIndex(['2012-01-01']).to_pydatetime() to fix there test in the meantime (or just leave out the astype, that should work as well I think).
This is clearly a regression between 0.19 and 0.20, although I am not sure it is something we want to fix.
Comment From: jreback
not a regression at all datetime.datetime is not a valid dtype this is the correct behavior
Comment From: jorisvandenbossche
Ah, I somehow assumed it did convert it to datetime.datetime objects, but it just converted to object dtype with timestamps.
@QuLogic As you can see in the commit you linked to, the whatsnew message was "Bug in pandas_dtype where passing invalid dtype didn't raise an error." (not the most clear, as it should not mention the internal pandas_dtype) as a bug fix. The bug was that astype(datetime.datetime) was not doing what you would expect it to be doing, converting to datetime.datetime objects. Previously, when passing something something to astype, everything that was not a proper dtype was regarded as 'object' dtype (as you would do astype(object)), now it raises an informative error.
Comment From: QuLogic
Ah, I did not write the test and didn't realize it was not doing what was expected there.