I have a dataset with nans as dates and other dates represented as str in format dd/mm/yyyy.
The max date in the dataset after dropping nans is Timestamp('2022-12-08 00:00:00'), which works with pd.to_datetime by only passing date. This however, raises a lot of warning. Passing with infer_from_format to True or errors arguments, gives Timestamp('2022-10-28 00:00:00'). Passing the format as "%d/%m/%Y" stops the warning but gives max date as Timestamp('2022-10-28 00:00:00').
- could you please tell how to fix it.
Comment From: MarcoGorelli
Hi @sunny1401
could you please tell how to fix it.
will do if you post a reproducible example
Comment From: sunny1401
Hi - thanks for replying - I think the issue can be closed now. I was passing the data with dayfirst=False. Hence it was rasing a warning and failing. Regarding working example: ` list_data = ["02/09/2022", "12/12/2022", "13/01/2023", "", "10/01/2022"] import pandas as pd dates = pd.Series(list_data) print(dates) 0 02/09/2022 -> 2nd of September 1 12/12/2022 -> 12th of October 2 13/01/2023 -> 13th of Jan 3 4 10/01/2022 -> 10th of Jan
dates = dates[~(dates == "")].reset_index(drop=True) print(dates) 0 02/09/2022 -> 2nd of September 1 12/12/2022 -> 12th of October 2 13/01/2023 -> 13th of Jan 3 10/01/2022 -> 10th of Jan
d1 = pd.to_datetime(dates)
However,in my usecase - I was ignoring UserWarnings, so it was allowing parsing the data incorrectly. Even with infer_datetime_format=True, if dayfirst=False, it parses incorrectly but doesn't raise. Continuing further -
d2 = pd.to_datetime(dates, infer_datetime_format=True)
d3 = pd.to_datetime(dates, infer_datetime_format=True, dayfirst=True) 0 2022-09-02 -> 2nd of Sept 1 2022-12-12 2 2023-01-13 3 2022-01-10 -> 10th of Jan ` Maybe it would be good to raise when inconsistent arguments are passed. However, for now, unignore from warnings helped me with the error.
Comment From: MarcoGorelli
hey @sunny1401
this is fixed in pandas 2.0.0 (hopefully coming out next month), to_datetime
will try to infer the format and, if it can, it will parse consistently
if you want to try it out before the final release, please check the instructions here https://pandas.pydata.org/docs/dev/getting_started/install.html#installing-the-development-version-of-pandas
Comment From: sunny1401
Oh cool - thankyou.. I will check that out
Comment From: MarcoGorelli
thanks @sunny1401 ! any bug reports ahead of the final release would be really helpful, so you'd be a star if you could try it out and report anything that doesn't work as intended ⭐