Feature Type

  • [ ] Adding new functionality to pandas

  • [X] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

If you run pd.to_datetime on the following Series:

    "11-12-2029",
    "02-12-2012",
    "11-09-2012",
    "13-02-2000",
    "10-11-2001"

pandas (>= 2.0) will infer the datetime format from the first non-missing example (%m%d%Y), try to apply this type to all the series, fail on 13-02-2000, and raise an error (before version 2.0, this would silently create a mixed type). I wish pandas could infer the right format from such a series, where only one format works for all rows.

Feature Description

Pseudo code

If using dayfirst=True and dayfirst=False don't give the same format for guess_datetime_format on the first non missing example (i.e both works): Try both formats on the Series (probably on a random subset for speed). If one works for all rows, return this format. If both work, trust the dayfirst parameter (and maybe raise a warning). If none work and error="raise", raise an error. If errors = "coerce" or errors="ignore", one could either trust the dayfirst parameter, or see which of dayfirst value leads to the smallest number of non-parsed values.

Implementation

Change function _guess_datetime_format_for_array (in pandas.core.tools.datetimes) so that it tries both dayfirst=True and dayfirst=False on the first non-null example. In the same function, if both options give a different format, try array_strptime with both format on a random subset of the array (100?) with strict error, and check that one of the tries doesn't fail.

Alternative Solutions

I don't know.

Additional Context

No response

Comment From: MarcoGorelli

thanks @LeoGrin for the suggestion

yeah I think we could improve the inference, for example by trying the first n non-null rows or taking random sample, and then taking a majority vote

would you be interested in trying this out and submitting a PR?

Comment From: LeoGrin

Thanks for the feedback! Yes I would :)

Comment From: LeoGrin

take