• [x] I have checked that this issue has not already been reported. Root cause of issue #32802 might be the trouble I report here.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
# Trouble 1 with Serie (same with DataFrame)
# Working with dtype int64
df = pd.DataFrame(columns=['First'],dtype=np.int64)
df['First']
Series([], Name: First, dtype: int64)
# Not working with dtype datetime64
df = pd.DataFrame(columns=['First'],dtype=np.datetime64)
df['First']
Series([], Name: First, dtype: object)
# Trouble 2 with Serie (works with DataFrame)
# Try to use 'astype' as a workaround
df = df.astype({'First':np.datetime64})
df['First']
ValueError: The 'datetime64' dtype has no unit.
            Please pass in 'datetime64[ns]' instead.

But then:

df = df.astype({'First':np.datetime64[ns]})
NameError: name 'ns' is not defined

And if you are lucky enough to have a DataFrame (hence with several columns), here the workaround works.

df = pd.DataFrame(columns=['First','Last'])
df = df.astype({'First':np.datetime64,'Last':np.datetime64})
df.loc[:,'First']
Series([], Name: First, dtype: datetime64[ns])

Yeah baby! :)

Expected Output

Series([], Name: First, dtype: datetime64[ns])

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.6.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-59-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.UTF-8 pandas : 1.0.5 numpy : 1.16.3 pytz : 2020.1 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.2.0.post20200511 Cython : None pytest : None hypothesis : None sphinx : 3.0.3 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : 0.3.3 gcsfs : None lxml.etree : None matplotlib : 3.0.3 numexpr : 2.7.1 odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : 0.48.0

Comment From: arw2019

Actually the workaround does apply for Series too if you put string quotes around datetime64[ns]. If I do:

df = pd.DataFrame(columns=['First'],dtype='datetime64[ns]') 
df['First'] 

I get

Series([], Name: First, dtype: datetime64[ns])

Comment From: jreback

np.datetime64 with no unit should always raise

i guess we missed a case - would take a PR to raise correctly in your first example

Comment From: erfannariman

Would something like this be sufficient in the DataFrame constructor @jreback ? If so then I can create a PR:

dtype = np.datetime64
if np.datetime_data(dtype)[0] == 'generic':
    raise TypeError('Numpy datetime without unit is not a valid dtype')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-129-be65bfca10de> in <module>
      1 dtype = np.datetime64
      2 if np.datetime_data(dtype)[0]:
----> 3     raise TypeError('numpy datetime without unit is not a valid dtype')

TypeError: numpy datetime without unit is not a valid dtype

Comment From: jreback

we already have this check for a dtype it’s just not checked when it’s a dict

you will have to trace where this should go

Comment From: jbrockmendel

in astype_nansafe inside the elif is_object_dtype(arr) check we have cases for is_datetime64_dtype(dtype) and is_timedelta64_dtype(dtype). These need to call ensure_nanosecond_dtype(dtype) i think

Comment From: jbrockmendel

This now correctly raises in all cited cases. Closing.