Code Sample, a copy-pastable example if possible

>>> s = pd.Series(['9/22/1677', '4/11/2262'], dtype="U")

>>> s
0    9/22/1677
1    4/11/2262
dtype: object


>>> s.astype('datetime64')
0   1677-09-22
1   2262-04-11
dtype: datetime64[ns]


>>> s.astype('datetime64[s]')
0   1677-09-22
1   2262-04-11
dtype: datetime64[ns]


>>> s.astype('datetime64[m]')
0   1677-09-22
1   2262-04-11
dtype: datetime64[ns]


>>> s.astype('datetime64[h]')
0   1677-09-22
1   2262-04-11
dtype: datetime64[ns]


>>> s.astype('datetime64[D]')
0   2262-04-11    # <<<< wait, what?
1   2262-04-11
dtype: datetime64[ns]

Problem description

It appears that astype('datetime64[D]') breaks on the earliest representable full day -- it wraps around to the latest representable date.

While I know this is super edge case-y and highly unlikely to be worth a lot of time, the fact that it wraps around is indicative of a potential for integer overflow somewhere that may or may not bite us in the future.

Expected Output

.astype('datetime64[D]') should give 1677-09-22 instead of 2262-04-11.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.3.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 0.25.0 numpy : 1.17.0 pytz : 2019.2 dateutil : 2.8.0 pip : 19.2.2 setuptools : 41.1.0 Cython : 0.29.13 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.4.1 html5lib : None pymysql : None psycopg2 : 2.8.3 (dt dec pq3 ext lo64) jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 3.1.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.14.1 pytables : None s3fs : 0.3.3 scipy : None sqlalchemy : None tables : None xarray : None xlrd : None xlwt : None xlsxwriter : None

Comment From: MarPur

Any objections if I take a stab at this?

Comment From: MarPur

This behaviour comes from numpy. Since pandas first casts to datetime64[ns] before casting to datetime64[D], the calls to numpy's API look something like this:

>>> np.datetime64('1677-09-22').astype('datetime64[ns]').astype('datetime64[D]')
numpy.datetime64('2262-04-11')

One way to fix this would be to raise the minimum datetime value from 1677-09-21 00:12:43.145225 to 1677-09-22 00:12:43.145225 (add one day)

>>> np.datetime64('1677-09-22 00:12:43.145224190').astype('datetime64[D]')
numpy.datetime64('2262-04-11')
>>> np.datetime64('1677-09-22 00:12:43.145224191').astype('datetime64[D]')
numpy.datetime64('1677-09-22')

I have been working on a PR to do this, I just need to fix the tests.

Comment From: jbrockmendel

We now raise when trying to cast to non-supported resos ("D") and use astype_overflowsafe to prevent silent overflows when converting between resos. Closing.