Pandas to_datetime gives todays date when coercing errors of time values

Code Sample, a copy-pastable example if possible

import pandas as pd

d = {'start_date': ['37:04.0', '55:20.0', '44:05.3', '43:41.7', '40:35.0', '26:14.4', '57:00.7', '18:53.8', '12:23.2', '48:33.2', '36:38.1', '29:35.7', '56:02.4', '27:41.4', '11:49.9', '17:36.6', '50:44.2', '36:36.1', '21:14.7']}
data = pd.DataFrame(data=d)

data['customer_created_at'] = pd.to_datetime(data['start_date'], errors='coerce')

print (data['customer_created_at'])

Problem description

When coercing the list of time values I expect all of them to return NaT. However, a few of these come back with today's date. I tried yesterday and the returned value was:

0                     NaT
1                     NaT
2                     NaT
3                     NaT
4                     NaT
5                     NaT
6                     NaT
7     2018-10-31 18:53:48
8     2018-10-31 12:23:12
9                     NaT
10                    NaT
11                    NaT
12                    NaT
13                    NaT
14    2018-10-31 11:49:54
15    2018-10-31 17:36:36
16                    NaT
17                    NaT
18    2018-10-31 21:14:42

Today I get:

0                     NaT
1                     NaT
2                     NaT
3                     NaT
4                     NaT
5                     NaT
6                     NaT
7     2018-11-01 18:53:48
8     2018-11-01 12:23:12
9                     NaT
10                    NaT
11                    NaT
12                    NaT
13                    NaT
14    2018-11-01 11:49:54
15    2018-11-01 17:36:36
16                    NaT
17                    NaT
18    2018-11-01 21:14:42

Expected Output

All time values should return NaT when coerced

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 18.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.23.4 pytest: 3.5.1 pip: 18.0 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.2.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None None

Comment From: mroeschke

I am guessing this is hitting our datutil parser as it will interpret these as times corresponding to the current date:

In [1]: from dateutil import parser

In [2]: parser.parse('18:53:48')
Out[2]: datetime.datetime(2018, 11, 1, 18, 53, 48)

In general we try to stay consistent with dateutil, but I can see how this is a surprising (undocumented) behavior. Could use some discussion on whether we should just document this case (which I am in more favor of) or should coerce these to NaT.

Comment From: nickwoelk

Thanks for the update. It's strange that it is only happening for certain rows. I've created a workaround on my end using an if statement to prevent the to.datetime from affecting rows without a date.

Comment From: mroeschke

Well the rows that have "time looking" values are getting parsed like this. As you can see, rows where the first value is < 23 (i.e an hour looking value) are getting parsed and the rest to NaT

Comment From: MarcoGorelli

This now throws a warning making it clear that each row will be parsed by dateutil:

In [29]: import pandas as pd
    ...: 
    ...: d = {'start_date': ['37:04.0', '55:20.0', '44:05.3', '43:41.7', '40:35.0', '26:14.4', '57:00.7', '18:53.8', '12:23.2', '48:33.2', '36:38.1', '29:35.7', '56:02.4', '27:41.4',
    ...: '11:49.9', '17:36.6', '50:44.2', '36:36.1', '21:14.7']}
    ...: data = pd.DataFrame(data=d)
    ...: 
    ...: data['customer_created_at'] = pd.to_datetime(data['start_date'], errors='coerce')
    ...: 
    ...: print (data['customer_created_at'])
<ipython-input-29-c6a01a69a58d>:6: UserWarning: Could not infer format, so each element will be parsed individually by `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  data['customer_created_at'] = pd.to_datetime(data['start_date'], errors='coerce')
0                    NaT
1                    NaT
2                    NaT
3                    NaT
4                    NaT
5                    NaT
6                    NaT
7    2022-12-17 18:53:48
8    2022-12-17 12:23:12
9                    NaT
10                   NaT
11                   NaT
12                   NaT
13                   NaT
14   2022-12-17 11:49:54
15   2022-12-17 17:36:36
16                   NaT
17                   NaT
18   2022-12-17 21:14:42
Name: customer_created_at, dtype: datetime64[ns]

I think we can close then, but thanks for the report!

Pandas to_datetime gives todays date when coercing errors of time values

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`