Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame({"some_dates": ["1/1/2025","12/1/2025","13/1/2025","1/12/2025","11/12/2025","13/12/2025",]})
df["converted_dates"] = df.astype({"some_dates": "datetime64[ns]"})
print(df)
# output:
# some_dates converted_dates
# 0 1/1/2025 2025-01-01
# 1 12/1/2025 2025-12-01 )
# 2 13/1/2025 2025-01-13 ) <- converted_date reverses day and month
# 3 1/12/2025 2025-01-12
# 4 11/12/2025 2025-11-12
# 5 13/12/2025 2025-12-13
Issue Description
When converting dates using astype, dates that are valid monthfirst dates (eg 1 Dec 2025) are interpreted as such. If a date is not valid monthfirst (13 Jan 2025) but it is valid dayfirst then the individual line is interpreted as a dayfirst field.
There was a comment by @MarcoGorelli here: https://github.com/pandas-dev/pandas/issues/53127#issuecomment-1547296982 that disallowing converting string dates with astype('datetime64[ns]') might be a good idea and after a morning debugging this I'm inclined to agree!
Expected Behavior
In general, I would expect a column of data to have a consistent interpretation. It should be an error or at least a warning for different rows to be interpreted differently without an explicit user request.
Installed Versions
Comment From: palbha
cc @rhshadrach I am using the latest version & the above code throws an error
pip install pandas==2.2.0 import pandas as pd df = pd.DataFrame({"some_dates": ["1/1/2025","12/1/2025","13/1/2025","1/12/2025","11/12/2025","13/12/2025",]}) df["converted_dates"] = df.astype({"some_dates": "datetime64[ns]"}) print(df)
Comment From: rhshadrach
@palbha - the latest version of pandas is 2.2.3; can you try with that.
Comment From: rhshadrach
@MarcoGorelli - per PDEP-4, this should raise, is that right?
Comment From: MarcoGorelli
yeah probably...pdep4 didn't specifically say anything about datetimeindex, but my inclination is that the DatetimeIndex
constructor, as well as astype('datetime64[ns]')
should either:
- only parse iso8601-like formats
- just not be supported, so people use to_datetime
which accepts format
/ dayfirst
/ etc. arguments
Comment From: cgflex
The main thing that surprised me (and I realise I could have just read the documentation...) was that the decision about how to parse dates was made at row level rather than column level. That's a bit of a hand-wavey point, I know, but not realising that was what really caused me problems with this particular issue (and I have now switched to to_datetime!)
Comment From: rhshadrach
I would be for restricting to iso8601-like formats rather than removing astype
behavior entirely.
Comment From: Anurag-Varma
@palbha @rhshadrach
The issue is present in the latest pandas-dev version. (Attaching the picture below)
Comment From: Anurag-Varma
@MarcoGorelli @rhshadrach
Should the fix for this be like change in code to only accept iso8601 type strings ?
And change documentation to reflect this new changes for astype('datetime64[ns]')
?
Comment From: Anurag-Varma
take
Comment From: rhshadrach
Should the fix for this be like change in code to only accept iso8601 type strings ?
Yes, I think so. At least eventually.
It seems to me restricting to iso8601 can impact valid uses and has a clear deprecation path. I think we should deprecate the current behavior rather than making a breaking change.