Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from datetime import datetime
from pandas.core.tools.datetimes import guess_datetime_format

my_date_str = '2022-JAN-14'
my_date = datetime(2022,1,14,0,0)
fmt = '%Y-%b-%d'
# this is a valid format...
assert datetime.strptime(my_date_str,fmt)==my_date
# ...albeit not guessed by pandas
assert guess_datetime_format(my_date) is not None

Issue Description

It appears that pandas utils do not guess some valid datetime types.

This hits the performance of the casting method pd.to_datetime, which falls back to inefficient code paths without a correctly guessed format. More precisely, the module pandas.core.tools.datetimes chooses the fast-track method _to_datetime_with_format when the format is guessed and the generic objects_to_datetime64ns when the guess fails.

Expected Behavior

guess_datetime_format(my_date) matches fmt

Installed Versions

INSTALLED VERSIONS ------------------ commit : 66e3805b8cabe977f40c05259cc3fcf7ead5687d python : 3.8.10.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-1056-azure Version : #58~18.04.1-Ubuntu SMP Wed Jul 28 23:14:18 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.3.5 numpy : 1.21.5 pytz : 2021.3 dateutil : 2.8.2 pip : 21.1.1 setuptools : 56.0.0 Cython : None pytest : 6.2.5 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : 1.1 pymysql : None psycopg2 : 2.9.1 (dt dec pq3 ext lo64) jinja2 : None IPython : 7.30.1 pandas_datareader: None bs4 : None bottleneck : None fsspec : 2022.01.0 fastparquet : 0.7.2 gcsfs : None matplotlib : 3.5.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 3.0.0 pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : 1.4.29 tables : None tabulate : 0.8.9 xarray : None xlrd : None xlwt : None numba : None

Comment From: mroeschke

Thanks for the request. Happy to have a PR to add parsing logic for dates with month names

Comment From: maciejskorski

@mroeschke thanks for that quick reply. I will be happy to test if you make a PR for that. Ideally we would love to have the guessing util supporting almost all of the valid formats - if that's possible? ;-)

Note: Stackoverflow also points that guessing is fairly limited

Comment From: MarcoGorelli

this currently only happens if it's Jan:

In [11]: guess_datetime_format('2022-JAN-14')

In [12]: guess_datetime_format('2022-Jan-14')
Out[12]: '%Y-%b-%d'

Comment From: MarcoGorelli

I don't think we should support this - we use strftime to guess the format, and that returns it as Jan:

In [2]: dt.datetime(2022, 1, 1).strftime('%b')
Out[2]: 'Jan'

I'd rather not add extra complexity to guess_datetime_format for something that can be solved as easily as by the user passing format=

Closing then, but thanks for the suggestion (though, general open-source tip - "I will be happy to test if you make a PR for that" comes across as demanding work from others, which is generally unwelcome)

Ideally, dateutil would support this (https://github.com/dateutil/dateutil/issues/1138)