Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import datetime
dates = ['2023/01/23','20230123']
strptime_dates = []
for d in dates:
    try:
        dd = datetime.datetime.strptime(d, '%Y/%m/%d')
    except ValueError:
        dd = 'Bad Format'
    strptime_dates.append(dd)
print(pd.to_datetime(dates, errors='coerce', format='%Y/%m/%d'))
print(strptime_dates)

Issue Description

The documentation on this function references the formatting, and thus behaviour, of the standard strftime and strptime functions. https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html

By and large this is correct. However, in the above example we see that strptime correctly rejects and improperly formatted date (in this case one missing forward slashes between the date components) whilst pd.to_datetime ignores this error (despite the exact=True option being the default).

Pandas thus makes it impossible for the data coder to discover this error in the data without resorting to df.apply type functionality.

Expected Behavior

pd.to_datetime should mirror the behaviour of datetime.datetime.strptime.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 66e3805b8cabe977f40c05259cc3fcf7ead5687d python : 3.9.9.final.0 python-bits : 64 OS : Darwin OS-release : 21.6.0 Version : Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8 pandas : 1.3.5 numpy : 1.20.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.3.1 setuptools : 59.8.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 8.0.1 pandas_datareader: None bs4 : None bottleneck : None fsspec : 2022.01.0 fastparquet : None gcsfs : None matplotlib : 3.5.1 numexpr : None odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : 6.0.1 pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : None tables : None tabulate : None xarray : 0.20.2 xlrd : 2.0.1 xlwt : None numba : 0.54.1

Comment From: mroeschke

Thanks for the report. Agreed if exact=True, the format '%Y/%m/%d' should not be able to parse '20230123'.

Internally it appears '%Y/%m/%d' is deemed a "iso8601" format so a custom require_iso8601 code path is used instead of the provided format.

Comment From: andrewchen1216

take

Comment From: MarcoGorelli

is this the same as https://github.com/pandas-dev/pandas/issues/12649? If so, it should be addressed by https://github.com/pandas-dev/pandas/pull/49333

Comment From: MarcoGorelli

closing as I'm pretty sure it's a duplicate, but please do let me know if I've misunderstood and I'll reopen

Comment From: rhjmoore

Looks likely; interesting that other (quite old) bug didn't come up in the search, but glad that's it's being done. Is it possible to verify the failure case at the top of this report against the fix patch you link to?

Comment From: MarcoGorelli

yeah there's some tests in https://github.com/pandas-dev/pandas/pull/49333/files#diff-388d9e4dc158bf81c94ed5df7ac7027cde97d599db685376f7988ed33bdba9b7 which check this kind of thing:

    @pytest.mark.parametrize(
        "input, format",
        [
            ("2020-01", "%Y/%m"),
            ("2020-01-01", "%Y/%m/%d"),
            ("2020-01-01 00", "%Y/%m/%dT%H"),
            ("2020-01-01T00", "%Y/%m/%d %H"),
            ("2020-01-01 00:00", "%Y/%m/%dT%H:%M"),
            ("2020-01-01T00:00", "%Y/%m/%d %H:%M"),
            ("2020-01-01 00:00:00", "%Y/%m/%dT%H:%M:%S"),
            ("2020-01-01T00:00:00", "%Y/%m/%d %H:%M:%S"),
        ],
    )
    def test_to_datetime_iso8601_separator(self, input, format):
        # https://github.com/pandas-dev/pandas/issues/12649
        with pytest.raises(
            ValueError,
            match=(
                rf"time data \"{input}\" at position 0 doesn\'t match format "
                rf"\"{format}\""
            ),
        ):
            to_datetime(input, format=format)

should be fixed by the time the next non-patch release comes out (2.0)