Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from datetime import datetime
from pandas.core.tools.datetimes import guess_datetime_format
my_date_str = '2022-JAN-14'
my_date = datetime(2022,1,14,0,0)
fmt = '%Y-%b-%d'
# this is a valid format...
assert datetime.strptime(my_date_str,fmt)==my_date
# ...albeit not guessed by pandas
assert guess_datetime_format(my_date) is not None
Issue Description
It appears that pandas
utils do not guess some valid datetime types.
This hits the performance of the casting method pd.to_datetime
, which falls back to inefficient code paths without a correctly guessed format. More precisely, the module pandas.core.tools.datetimes
chooses the fast-track method _to_datetime_with_format
when the format is guessed and the generic objects_to_datetime64ns
when the guess fails.
Expected Behavior
guess_datetime_format(my_date)
matches fmt
Installed Versions
Comment From: mroeschke
Thanks for the request. Happy to have a PR to add parsing logic for dates with month names
Comment From: maciejskorski
@mroeschke thanks for that quick reply. I will be happy to test if you make a PR for that. Ideally we would love to have the guessing util supporting almost all of the valid formats - if that's possible? ;-)
Note: Stackoverflow also points that guessing is fairly limited
Comment From: MarcoGorelli
this currently only happens if it's Jan
:
In [11]: guess_datetime_format('2022-JAN-14')
In [12]: guess_datetime_format('2022-Jan-14')
Out[12]: '%Y-%b-%d'
Comment From: MarcoGorelli
I don't think we should support this - we use strftime
to guess the format, and that returns it as Jan
:
In [2]: dt.datetime(2022, 1, 1).strftime('%b')
Out[2]: 'Jan'
I'd rather not add extra complexity to guess_datetime_format
for something that can be solved as easily as by the user passing format=
Closing then, but thanks for the suggestion (though, general open-source tip - "I will be happy to test if you make a PR for that" comes across as demanding work from others, which is generally unwelcome)
Ideally, dateutil would support this (https://github.com/dateutil/dateutil/issues/1138)