Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.to_datetime('2018-08-18 9', format='%Y-%m-%d %H', infer_datetime_format=True)
Issue Description
Not able to convert the non-zero padded Hour and zero padded date to timestamp.
ParserError: Unknown string format: 2018-08-18 9
Not having any issue converting:
pd.to_datetime('2018-08-18 09', format='%Y-%m-%d %H', infer_datetime_format=True)
OUTPUT: Timestamp('2018-08-18 09:00:00')
Expected Behavior
I am expecting Timestamp('2018-08-18 09:00:00')
with non-zero padded hour.
Installed Versions
Comment From: JosephParampathu
I have been looking at this and just wanted to report my findings so far. Per the documentation for to_datetime, when infer_datetime_format is used, you should not pass a format as infer_datetime_format only guesses if the year or day is the first digit in the format.
That being said, the issue with passing a time where the hour does not have a padding zero is that the "format" argument uses strftime from datetime. You can see here that strftime requires hours to be passed with padded zeros (meaning 09, as opposed to 9). At that link if you read both the %H information and note 9 at the bottom, we can see that the padded zero is optional using strptime instead of strftime.
I was able to use the code below to reproduce what you are trying to do without the trailing zero, using strptime.
import pandas as pd
from datetime import datetime
print(pd.to_datetime(datetime.strptime("2018-8-18 9", "%Y-%m-%d %H"), infer_datetime_format=True))
While you may decide to use the code above for your use case, I am not sure if replacing strftime with strptime in pandas may be more useful at this time. Any input would be appreciated.
Comment From: simonjayhawkins
Thanks @RakeshJarupula for the report and @JosephParampathu for the investigation.
if the infer_datetime_format=True
is omitted, the traceback is...
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2084 try:
-> 2085 values, tz_parsed = conversion.datetime_to_datetime64(data)
2086 # If tzaware, these values represent unix timestamps, so we
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
/tmp/ipykernel_27674/4203207622.py in <module>
----> 1 pd.to_datetime("2018-08-18 9", format="%Y-%m-%d %H")
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
830 result = convert_listlike(arg, format)
831 else:
--> 832 result = convert_listlike(np.array([arg]), format)[0]
833
834 return result
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
463 assert format is None or infer_datetime_format
464 utc = tz == "utc"
--> 465 result, tz_parsed = objects_to_datetime64ns(
466 arg,
467 dayfirst=dayfirst,
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2088 return values.view("i8"), tz_parsed
2089 except (ValueError, TypeError):
-> 2090 raise e
2091
2092 if tz_parsed is not None:
~/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2073
2074 try:
-> 2075 result, tz_parsed = tslib.array_to_datetime(
2076 data,
2077 errors=errors,
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
ValueError: time data 2018-08-18 9 doesn't match format specified
I think that the pandas cython parser should ideally match the stdlib behavior.
import datetime as dt
dt.datetime.strptime("2018-08-18 9", "%Y-%m-%d %H") # datetime.datetime(2018, 8, 18, 9, 0)
but not sure whether resolving this would then also solve the issue in the OP.
Comment From: MarcoGorelli
This is the part of the code that'd need changing if you wanted to support this
https://github.com/pandas-dev/pandas/blob/a37b78d534cd25e21d2ab64318ceae808df46c89/pandas/_libs/tslibs/src/datetime/np_datetime_strings.c#L581-L601
In the meantime, I think it's fine to error
Comment From: MarcoGorelli
Actually, this can probably be handled as part of #50242
Thanks for the report, I'll add a test case and hopefully close it as part of that
Comment From: MarcoGorelli
also, looks like a dupe of https://github.com/pandas-dev/pandas/issues/21422, so let's close in favour of that