Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({"some_dates": ["1/1/2025","12/1/2025","13/1/2025","1/12/2025","11/12/2025","13/12/2025",]})
df["converted_dates"] = df.astype({"some_dates": "datetime64[ns]"})
print(df)

# output:
#    some_dates converted_dates
# 0    1/1/2025      2025-01-01
# 1   12/1/2025      2025-12-01 )
# 2   13/1/2025      2025-01-13 ) <- converted_date reverses day and month
# 3   1/12/2025      2025-01-12
# 4  11/12/2025      2025-11-12
# 5  13/12/2025      2025-12-13

Issue Description

When converting dates using astype, dates that are valid monthfirst dates (eg 1 Dec 2025) are interpreted as such. If a date is not valid monthfirst (13 Jan 2025) but it is valid dayfirst then the individual line is interpreted as a dayfirst field.

There was a comment by @MarcoGorelli here: https://github.com/pandas-dev/pandas/issues/53127#issuecomment-1547296982 that disallowing converting string dates with astype('datetime64[ns]') might be a good idea and after a morning debugging this I'm inclined to agree!

Expected Behavior

In general, I would expect a column of data to have a consistent interpretation. It should be an error or at least a warning for different rows to be interpreted differently without an explicit user request.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.11.9 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.26100 machine : AMD64 processor : Intel64 Family 6 Model 186 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : English_United Kingdom.1252 pandas : 2.2.3 numpy : 2.1.3 pytz : 2024.2 dateutil : 2.9.0.post0 pip : 24.3.1 Cython : None sphinx : None IPython : 8.12.3 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.4 lxml.etree : 5.3.0 matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.1.5 pandas_gbq : None psycopg2 : 2.9.10 pymysql : None pyarrow : None pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : 2.0.36 tables : None tabulate : None xarray : None xlrd : None xlsxwriter : 3.2.0 zstandard : None tzdata : 2024.2 qtpy : None pyqt5 : None None

Comment From: palbha

cc @rhshadrach I am using the latest version & the above code throws an error

pip install pandas==2.2.0 import pandas as pd df = pd.DataFrame({"some_dates": ["1/1/2025","12/1/2025","13/1/2025","1/12/2025","11/12/2025","13/12/2025",]}) df["converted_dates"] = df.astype({"some_dates": "datetime64[ns]"}) print(df)

Image

Comment From: rhshadrach

@palbha - the latest version of pandas is 2.2.3; can you try with that.

Comment From: rhshadrach

@MarcoGorelli - per PDEP-4, this should raise, is that right?

Comment From: MarcoGorelli

yeah probably...pdep4 didn't specifically say anything about datetimeindex, but my inclination is that the DatetimeIndex constructor, as well as astype('datetime64[ns]') should either: - only parse iso8601-like formats - just not be supported, so people use to_datetime which accepts format / dayfirst / etc. arguments

Comment From: cgflex

The main thing that surprised me (and I realise I could have just read the documentation...) was that the decision about how to parse dates was made at row level rather than column level. That's a bit of a hand-wavey point, I know, but not realising that was what really caused me problems with this particular issue (and I have now switched to to_datetime!)

Comment From: rhshadrach

I would be for restricting to iso8601-like formats rather than removing astype behavior entirely.

Comment From: Anurag-Varma

@palbha @rhshadrach

The issue is present in the latest pandas-dev version. (Attaching the picture below)

Image

Comment From: Anurag-Varma

@MarcoGorelli @rhshadrach

Should the fix for this be like change in code to only accept iso8601 type strings ?

And change documentation to reflect this new changes for astype('datetime64[ns]')?

Comment From: Anurag-Varma

take

Comment From: rhshadrach

Should the fix for this be like change in code to only accept iso8601 type strings ?

Yes, I think so. At least eventually.

It seems to me restricting to iso8601 can impact valid uses and has a clear deprecation path. I think we should deprecate the current behavior rather than making a breaking change.