Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
to_datetime(['19801212.0'], format='%Y%m%d', exact=True)
Issue Description
This should fail, there's no interpretation of format
in which 19801212.0
matches %Y%m%d
Expected Behavior
Probably something like
ValueError: unconverted data remains: .0
but certainly not that it passes without an error
Installed Versions
Comment From: MarcoGorelli
We don't do this for any other format, why are we special-casing %Y%m%d
?
In [1]: print(to_datetime(['198012.0'], format='%Y%m'))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[1], line 1
----> 1 print(to_datetime(['198012.0'], format='%Y%m'))
File ~/pandas-dev/pandas/core/tools/datetimes.py:1115, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
1113 result = _convert_and_box_cache(argc, cache_array)
1114 else:
-> 1115 result = convert_listlike(argc, format)
1116 else:
1117 result = convert_listlike(np.array([arg]), format)[0]
File ~/pandas-dev/pandas/core/tools/datetimes.py:438, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
435 require_iso8601 = not infer_datetime_format
437 if format is not None and not require_iso8601:
--> 438 res = _to_datetime_with_format(
439 arg, orig_arg, name, utc, format, exact, errors, infer_datetime_format
440 )
441 if res is not None:
442 return res
File ~/pandas-dev/pandas/core/tools/datetimes.py:544, in _to_datetime_with_format(arg, orig_arg, name, utc, fmt, exact, errors, infer_datetime_format)
541 return _box_as_indexlike(result, utc=utc, name=name)
543 # fallback
--> 544 res = _array_strptime_with_fallback(
545 arg, name, utc, fmt, exact, errors, infer_datetime_format
546 )
547 return res
File ~/pandas-dev/pandas/core/tools/datetimes.py:478, in _array_strptime_with_fallback(arg, name, utc, fmt, exact, errors, infer_datetime_format)
474 """
475 Call array_strptime, with fallback behavior depending on 'errors'.
476 """
477 try:
--> 478 result, timezones = array_strptime(
479 arg, fmt, exact=exact, errors=errors, utc=utc
480 )
481 except OutOfBoundsDatetime:
482 if errors == "raise":
File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:185, in pandas._libs.tslibs.strptime.array_strptime()
183 iresult[i] = NPY_NAT
184 continue
--> 185 raise ValueError(f"unconverted data remains: {val[found.end():]}")
186
187 # search
ValueError: unconverted data remains: .0
Comment From: MarcoGorelli
Pretty sure I disagree with this test:
https://github.com/pandas-dev/pandas/blob/2804681c45b5f748d0977e3998bbfc91d4653b10/pandas/tests/tools/test_to_datetime.py#L119-L135
These should raise, the inputs don't match the given format
I don't think the %Y%m%d special path serves any purpose, we should just remove it
Comment From: MarcoGorelli
This doesn't look right:
https://github.com/pandas-dev/pandas/blob/073e48b7cfe85b7a1173a275296471b51276826d/pandas/tests/tools/test_to_datetime.py#L142-L148
If the input is invalid, errors='ignore' should return the input
Comment From: jorisvandenbossche
Resolved by https://github.com/pandas-dev/pandas/pull/50242