I have a dataset with nans as dates and other dates represented as str in format dd/mm/yyyy.

The max date in the dataset after dropping nans is Timestamp('2022-12-08 00:00:00'), which works with pd.to_datetime by only passing date. This however, raises a lot of warning. Passing with infer_from_format to True or errors arguments, gives Timestamp('2022-10-28 00:00:00'). Passing the format as "%d/%m/%Y" stops the warning but gives max date as Timestamp('2022-10-28 00:00:00').

  • could you please tell how to fix it.

Comment From: MarcoGorelli

Hi @sunny1401

could you please tell how to fix it.

will do if you post a reproducible example

Comment From: sunny1401

Hi - thanks for replying - I think the issue can be closed now. I was passing the data with dayfirst=False. Hence it was rasing a warning and failing. Regarding working example: ` list_data = ["02/09/2022", "12/12/2022", "13/01/2023", "", "10/01/2022"] import pandas as pd dates = pd.Series(list_data) print(dates) 0 02/09/2022 -> 2nd of September 1 12/12/2022 -> 12th of October 2 13/01/2023 -> 13th of Jan 3 4 10/01/2022 -> 10th of Jan

dates = dates[~(dates == "")].reset_index(drop=True) print(dates) 0 02/09/2022 -> 2nd of September 1 12/12/2022 -> 12th of October 2 13/01/2023 -> 13th of Jan 3 10/01/2022 -> 10th of Jan

d1 = pd.to_datetime(dates) :1: UserWarning: Parsing dates in DD/MM/YYYY format when dayfirst=False (the default) was specified. This may lead to inconsistently parsed dates! Specify a format to ensure consistent parsing. print(d1) 0 2022-02-09 -> 9th of Feb 1 2022-12-12 2 2023-01-13 3 2022-10-01 -> 1st of October However,in my usecase - I was ignoring UserWarnings, so it was allowing parsing the data incorrectly. Even with infer_datetime_format=True, if dayfirst=False, it parses incorrectly but doesn't raise. Continuing further - d2 = pd.to_datetime(dates, infer_datetime_format=True) :1: UserWarning: Parsing dates in DD/MM/YYYY format when dayfirst=False (the default) was specified. This may lead to inconsistently parsed dates! Specify a format to ensure consistent parsing. d2 0 2022-02-09 -> 9th of Feb 1 2022-12-12 2 2023-01-13 3 2022-10-01 -> 1st of October dtype: datetime64[ns]

d3 = pd.to_datetime(dates, infer_datetime_format=True, dayfirst=True) 0 2022-09-02 -> 2nd of Sept 1 2022-12-12 2 2023-01-13 3 2022-01-10 -> 10th of Jan ` Maybe it would be good to raise when inconsistent arguments are passed. However, for now, unignore from warnings helped me with the error.

Comment From: MarcoGorelli

hey @sunny1401

this is fixed in pandas 2.0.0 (hopefully coming out next month), to_datetime will try to infer the format and, if it can, it will parse consistently

if you want to try it out before the final release, please check the instructions here https://pandas.pydata.org/docs/dev/getting_started/install.html#installing-the-development-version-of-pandas

Comment From: sunny1401

Oh cool - thankyou.. I will check that out

Comment From: MarcoGorelli

thanks @sunny1401 ! any bug reports ahead of the final release would be really helpful, so you'd be a star if you could try it out and report anything that doesn't work as intended ⭐