-
[x] I have checked that this issue has not already been reported. Root cause of issue #32802 might be the trouble I report here.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
import numpy as np
# Trouble 1 with Serie (same with DataFrame)
# Working with dtype int64
df = pd.DataFrame(columns=['First'],dtype=np.int64)
df['First']
Series([], Name: First, dtype: int64)
# Not working with dtype datetime64
df = pd.DataFrame(columns=['First'],dtype=np.datetime64)
df['First']
Series([], Name: First, dtype: object)
# Trouble 2 with Serie (works with DataFrame)
# Try to use 'astype' as a workaround
df = df.astype({'First':np.datetime64})
df['First']
ValueError: The 'datetime64' dtype has no unit.
Please pass in 'datetime64[ns]' instead.
But then:
df = df.astype({'First':np.datetime64[ns]})
NameError: name 'ns' is not defined
And if you are lucky enough to have a DataFrame (hence with several columns), here the workaround works.
df = pd.DataFrame(columns=['First','Last'])
df = df.astype({'First':np.datetime64,'Last':np.datetime64})
df.loc[:,'First']
Series([], Name: First, dtype: datetime64[ns])
Yeah baby! :)
Expected Output
Series([], Name: First, dtype: datetime64[ns])
Output of pd.show_versions()
Comment From: arw2019
Actually the workaround does apply for Series too if you put string quotes around datetime64[ns]
. If I do:
df = pd.DataFrame(columns=['First'],dtype='datetime64[ns]')
df['First']
I get
Series([], Name: First, dtype: datetime64[ns])
Comment From: jreback
np.datetime64 with no unit should always raise
i guess we missed a case - would take a PR to raise correctly in your first example
Comment From: erfannariman
Would something like this be sufficient in the DataFrame
constructor @jreback ? If so then I can create a PR:
dtype = np.datetime64
if np.datetime_data(dtype)[0] == 'generic':
raise TypeError('Numpy datetime without unit is not a valid dtype')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-129-be65bfca10de> in <module>
1 dtype = np.datetime64
2 if np.datetime_data(dtype)[0]:
----> 3 raise TypeError('numpy datetime without unit is not a valid dtype')
TypeError: numpy datetime without unit is not a valid dtype
Comment From: jreback
we already have this check for a dtype it’s just not checked when it’s a dict
you will have to trace where this should go
Comment From: jbrockmendel
in astype_nansafe inside the elif is_object_dtype(arr)
check we have cases for is_datetime64_dtype(dtype)
and is_timedelta64_dtype(dtype)
. These need to call ensure_nanosecond_dtype(dtype) i think
Comment From: jbrockmendel
This now correctly raises in all cited cases. Closing.