Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.to_datetime('2018-08-18 9', format='%Y-%m-%d %H', infer_datetime_format=True)

Issue Description

Not able to convert the non-zero padded Hour and zero padded date to timestamp. ParserError: Unknown string format: 2018-08-18 9

Not having any issue converting: pd.to_datetime('2018-08-18 09', format='%Y-%m-%d %H', infer_datetime_format=True) OUTPUT: Timestamp('2018-08-18 09:00:00')

Expected Behavior

I am expecting Timestamp('2018-08-18 09:00:00') with non-zero padded hour.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 4bfe3d07b4858144c219b9346329027024102ab6 python : 3.8.13.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19043 machine : AMD64 processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_India.1252 pandas : 1.4.2 numpy : 1.20.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.2.2 setuptools : 58.0.4 Cython : 0.29.25 pytest : 6.2.5 hypothesis : None sphinx : 4.4.0 blosc : None feather : None xlsxwriter : 3.0.2 lxml.etree : 4.7.1 html5lib : 1.1 pymysql : None psycopg2 : 2.9.1 jinja2 : 2.11.3 IPython : 8.2.0 pandas_datareader: None bs4 : 4.10.0 bottleneck : 1.3.4 brotli : fastparquet : None fsspec : 2022.02.0 gcsfs : None markupsafe : 2.0.1 matplotlib : 3.5.0 numba : 0.54.1 numexpr : 2.8.1 odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : 7.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.7.3 snappy : None sqlalchemy : 1.4.27 tables : 3.6.1 tabulate : None xarray : None xlrd : 2.0.1 xlwt : 1.3.0 zstandard : None

Comment From: JosephParampathu

I have been looking at this and just wanted to report my findings so far. Per the documentation for to_datetime, when infer_datetime_format is used, you should not pass a format as infer_datetime_format only guesses if the year or day is the first digit in the format.

That being said, the issue with passing a time where the hour does not have a padding zero is that the "format" argument uses strftime from datetime. You can see here that strftime requires hours to be passed with padded zeros (meaning 09, as opposed to 9). At that link if you read both the %H information and note 9 at the bottom, we can see that the padded zero is optional using strptime instead of strftime.

I was able to use the code below to reproduce what you are trying to do without the trailing zero, using strptime.

import pandas as pd
from datetime import datetime
print(pd.to_datetime(datetime.strptime("2018-8-18 9", "%Y-%m-%d %H"), infer_datetime_format=True))

While you may decide to use the code above for your use case, I am not sure if replacing strftime with strptime in pandas may be more useful at this time. Any input would be appreciated.

Comment From: simonjayhawkins

Thanks @RakeshJarupula for the report and @JosephParampathu for the investigation.

if the infer_datetime_format=True is omitted, the traceback is...

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2084         try:
-> 2085             values, tz_parsed = conversion.datetime_to_datetime64(data)
   2086             # If tzaware, these values represent unix timestamps, so we

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_27674/4203207622.py in <module>
----> 1 pd.to_datetime("2018-08-18 9", format="%Y-%m-%d %H")

~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
    830             result = convert_listlike(arg, format)
    831     else:
--> 832         result = convert_listlike(np.array([arg]), format)[0]
    833 
    834     return result

~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    463         assert format is None or infer_datetime_format
    464         utc = tz == "utc"
--> 465         result, tz_parsed = objects_to_datetime64ns(
    466             arg,
    467             dayfirst=dayfirst,

~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2088             return values.view("i8"), tz_parsed
   2089         except (ValueError, TypeError):
-> 2090             raise e
   2091 
   2092     if tz_parsed is not None:

~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   2073 
   2074     try:
-> 2075         result, tz_parsed = tslib.array_to_datetime(
   2076             data,
   2077             errors=errors,

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

ValueError: time data 2018-08-18 9 doesn't match format specified

I think that the pandas cython parser should ideally match the stdlib behavior.

import datetime as dt

dt.datetime.strptime("2018-08-18 9", "%Y-%m-%d %H")  # datetime.datetime(2018, 8, 18, 9, 0)

but not sure whether resolving this would then also solve the issue in the OP.

Comment From: MarcoGorelli

This is the part of the code that'd need changing if you wanted to support this

https://github.com/pandas-dev/pandas/blob/a37b78d534cd25e21d2ab64318ceae808df46c89/pandas/_libs/tslibs/src/datetime/np_datetime_strings.c#L581-L601

In the meantime, I think it's fine to error

Comment From: MarcoGorelli

Actually, this can probably be handled as part of #50242

Thanks for the report, I'll add a test case and hopefully close it as part of that

Comment From: MarcoGorelli

also, looks like a dupe of https://github.com/pandas-dev/pandas/issues/21422, so let's close in favour of that