Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

(Still working on checking the main branch, getting a build is taking me a while)

Reproducible Example

import pandas as pd
pd.to_datetime('11:0')

Issue Description

Basically: the error message for pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime is confusing.

I was doing something like this:

df = pd.DataFrame({
    0: ["11:00", "12:00", "11:0"],
})

print(pd.to_datetime(df[0]))

which printed:

Traceback (most recent call last):
  File "/home/hayesall/miniconda3/envs//lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2211, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))
  File "pandas/_libs/tslibs/conversion.pyx", line 360, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mre.py", line 15, in <module>
    print(pd.to_datetime(df[0]))
  File "/home/hayesall/miniconda3/envs//lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 1051, in to_datetime
    values = convert_listlike(arg._values, format)
  File "/home/hayesall/miniconda3/envs//lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 402, in _convert_listlike_datetimes
    result, tz_parsed = objects_to_datetime64ns(
  File "/home/hayesall/miniconda3/envs//lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2217, in objects_to_datetime64ns
    raise err
  File "/home/hayesall/miniconda3/envs//lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2199, in objects_to_datetime64ns
    result, tz_parsed = tslib.array_to_datetime(
  File "pandas/_libs/tslib.pyx", line 381, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 608, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 604, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 559, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslibs/conversion.pyx", line 517, in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject
  File "pandas/_libs/tslibs/np_datetime.pyx", line 120, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 11:00:00

Which wasn't helpful for diagnosing my error.

Instead I had to do this to find the incorrect 11:0 value:

df = pd.DataFrame({
    0: ["11:00", "12:00", "11:0"],
})

for row in df.iterrows():
    try:
        pd.to_datetime(row[1][0])
    except:
        print(f"Line: {row[0]}, Value: {row[1][0]}")

# Line: 2, Value: 11:0

Expected Behavior

Something like:

Traceback (most recent call last):
  # ...
  File "pandas/_libs/tslibs/np_datetime.pyx", line 120, in pandas._libs.tslibs.np_datetime.check_dts_bounds

pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Could not convert "11:0" to datetime64

Installed Versions

INSTALLED VERSIONS ------------------ commit : 06d230151e6f18fdb8139d09abf539867a8cd481 python : 3.8.8.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-105-generic Version : #119~18.04.1-Ubuntu SMP Tue Mar 8 11:21:24 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.4.1 numpy : 1.19.3 pytz : 2021.1 dateutil : 2.8.1 pip : 22.0.4 setuptools : 52.0.0.post20210125 Cython : 0.29.23 pytest : 6.2.4 hypothesis : None sphinx : 4.1.2 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 7.26.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.4.1 numba : 0.53.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.6.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None

Comment From: jagadeeshjr5

import pandas as pd
ts = pd.to_datetime([11, 12, 11], unit = 'h')
ts

It returns the following

DatetimeIndex(['1970-01-01 11:00:00', '1970-01-01 12:00:00',
               '1970-01-01 11:00:00'],
              dtype='datetime64[ns]', freq=None)

If it is hour data you can directly pass them as integers (like 11), there is no need to put (like 11:00) in this format and then specify the unit parameter to tell it is hour data. Then it'll return the time stamp from the default timestamp.

import pandas as pd
pd.to_datetime([11, 12, 11], unit = 'h', origin=pd.Timestamp('2021-01-01'))

Output:

DatetimeIndex(['2021-01-01 11:00:00', '2021-01-01 12:00:00',
               '2021-01-01 11:00:00'],
              dtype='datetime64[ns]', freq=None)

Or else you can pass a time stamp you like by using origin parameter.

I hope it'll help you!!!

Comment From: dannyi96

take

Comment From: MarcoGorelli

Thanks for the report

I think the simplest approach could be to modify

https://github.com/pandas-dev/pandas/blob/cedd1222e3b2ac60d1006bf09df4c8a4870773c5/pandas/_libs/tslib.pyx#L670-L671

so that val is present in the assignment to ex.args

If anyone can come up with a good error message and wants to submit a pull request, that would be welcome


EDIT this might not be that simple, actually