Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
import pytz
import datetime
# Timezones
BRUSSELS = pytz.timezone('Europe/Brussels')
UTC = pytz.UTC
# Create a list of timezone-aware datetime objects in UTC
# Interval is 0b0.0000011 days, to prevent float rounding issues
dtstart = datetime.datetime(2014, 3, 30, 0, 0, tzinfo=UTC)
interval = datetime.timedelta(minutes=33, seconds=45)
interval_days = 0.0234375 # 2025 / 86400 seconds
N = 8
dt_utc = pd.date_range(start=dtstart, freq=interval, periods=N)
dt_bxl = pd.DatetimeIndex.tz_convert(dt_utc, BRUSSELS).astype(datetime.datetime)
Problem description
This is some code from a test in matplotlib that used to work with Pandas 0.19.0, c.f. matplotlib/matplotlib#8579. I couldn't find anything in the release notes that indicated this change should have occurred.
I think it may be related to this change which prohibits any NumPy-dtype objects that are not also Pandas-dtype objects, and since np.dtype(datetime.datetime).kind == 'O' and datetime.datetime not in [object, np.object_, 'object', 'O']
, this is not allowed.
Expected Output
N/A (no error)
Actual Output
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../conda/envs/mpl35/lib/python3.5/site-packages/pandas/core/indexes/datetimes.py", line 848, in astype
dtype = pandas_dtype(dtype)
File ".../conda/envs/mpl35/lib/python3.5/site-packages/pandas/core/dtypes/common.py", line 1909, in pandas_dtype
raise TypeError('dtype {0} not understood'.format(dtype))
TypeError: dtype <class 'datetime.datetime'> not understood
Output of pd.show_versions()
Comment From: jorisvandenbossche
Simpler reproducible example:
0.19.2:
In [36]: import datetime
In [37]: pd.DatetimeIndex(['2012-01-01']).astype(datetime.datetime)
Out[37]: Index([2012-01-01 00:00:00], dtype='object')
0.20.1:
In [34]: pd.DatetimeIndex(['2012-01-01']).astype(datetime.datetime)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-dd83d3b47780> in <module>()
----> 1 pd.DatetimeIndex(['2012-01-01']).astype(datetime.datetime)
/home/joris/scipy/pandas/pandas/core/indexes/datetimes.py in astype(self, dtype, copy)
846 @Appender(_index_shared_docs['astype'])
847 def astype(self, dtype, copy=True):
--> 848 dtype = pandas_dtype(dtype)
849 if is_object_dtype(dtype):
850 return self.asobject
/home/joris/scipy/pandas/pandas/core/dtypes/common.py in pandas_dtype(dtype)
1907 return npdtype
1908 elif npdtype.kind == 'O':
-> 1909 raise TypeError('dtype {0} not understood'.format(dtype))
1910
1911 return npdtype
TypeError: dtype <class 'datetime.datetime'> not understood
Matplotlib can use pd.DatetimeIndex(['2012-01-01']).to_pydatetime()
to fix there test in the meantime (or just leave out the astype
, that should work as well I think).
This is clearly a regression between 0.19 and 0.20, although I am not sure it is something we want to fix.
Comment From: jreback
not a regression at all datetime.datetime is not a valid dtype this is the correct behavior
Comment From: jorisvandenbossche
Ah, I somehow assumed it did convert it to datetime.datetime objects, but it just converted to object dtype with timestamps.
@QuLogic As you can see in the commit you linked to, the whatsnew message was "Bug in pandas_dtype
where passing invalid dtype didn't raise an error." (not the most clear, as it should not mention the internal pandas_dtype) as a bug fix. The bug was that astype(datetime.datetime)
was not doing what you would expect it to be doing, converting to datetime.datetime objects. Previously, when passing something something to astype
, everything that was not a proper dtype was regarded as 'object' dtype (as you would do astype(object)
), now it raises an informative error.
Comment From: QuLogic
Ah, I did not write the test and didn't realize it was not doing what was expected there.