Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [5]: ts = Timestamp('1900-01-01')

In [6]: Timestamp(ts.value, unit=ts.unit)
Out[6]: Timestamp('1900-01-01 00:00:00')

In [7]: ts = Timestamp('0300-01-01')

In [8]: Timestamp(ts.value, unit=ts.unit)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
File ~/pandas-dev/pandas/_libs/tslibs/conversion.pyx:141, in pandas._libs.tslibs.conversion.cast_from_unit()
    140 try:
--> 141     return <int64_t>(base * m) + <int64_t>(frac * m)
    142 except OverflowError as err:

OverflowError: Python int too large to convert to C long

The above exception was the direct cause of the following exception:

OutOfBoundsDatetime                       Traceback (most recent call last)
Cell In[8], line 1
----> 1 Timestamp(ts.value, unit=ts.unit)

File ~/pandas-dev/pandas/_libs/tslibs/timestamps.pyx:1637, in pandas._libs.tslibs.timestamps.Timestamp.__new__()
   1635     raise ValueError("nanosecond must be in 0..999")
   1636 
-> 1637 ts = convert_to_tsobject(ts_input, tzobj, unit, 0, 0, nanosecond)
   1638 
   1639 if ts.value == NPY_NAT:

File ~/pandas-dev/pandas/_libs/tslibs/conversion.pyx:300, in pandas._libs.tslibs.conversion.convert_to_tsobject()
    298     obj.value = NPY_NAT
    299 else:
--> 300     ts = cast_from_unit(ts, unit)
    301     obj.value = ts
    302     pandas_datetime_to_datetimestruct(ts, NPY_FR_ns, &obj.dts)

File ~/pandas-dev/pandas/_libs/tslibs/conversion.pyx:143, in pandas._libs.tslibs.conversion.cast_from_unit()
    141     return <int64_t>(base * m) + <int64_t>(frac * m)
    142 except OverflowError as err:
--> 143     raise OutOfBoundsDatetime(
    144         f"cannot convert input {ts} with the unit '{unit}'"
    145     ) from err

OutOfBoundsDatetime: cannot convert input -52700112000 with the unit 's'

Issue Description

The above throws an error when the timestamp is out-of-nano-bounds

Noticed while trying to address #51024, in

https://github.com/pandas-dev/pandas/blob/a82b8e2073ba3a3bdbfaa3c54c46375d6dd09977/pandas/core/resample.py#L2160-L2161

Expected Behavior

Timestamp('0300-01-01')

Installed Versions

INSTALLED VERSIONS ------------------ commit : d6afc862a7e2e0f742ccd99a96198b2768091284 python : 3.8.16.final.0 python-bits : 64 OS : Linux OS-release : 5.10.102.1-microsoft-standard-WSL2 Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8 pandas : 2.0.0.dev0+1340.gd6afc862a7 numpy : 1.23.5 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 66.1.1 pip : 22.3.1 Cython : 0.29.32 pytest : 7.2.1 hypothesis : 6.64.0 sphinx : 5.3.0 blosc : 1.11.1 feather : None xlsxwriter : 3.0.7 lxml.etree : 4.9.2 html5lib : 1.1 pymysql : 1.0.2 psycopg2 : 2.9.5 jinja2 : 3.1.2 IPython : 8.8.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : 1.3.6 brotli : fastparquet : 2023.1.0 fsspec : 2022.11.0 gcsfs : 2022.11.0 matplotlib : 3.6.3 numba : 0.56.4 numexpr : 2.8.4 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 10.0.1 pyreadstat : 1.2.0 pyxlsb : 1.0.10 s3fs : 2022.11.0 scipy : 1.10.0 snappy : sqlalchemy : 1.4.45 tables : 3.8.0 tabulate : 0.9.0 xarray : 2023.1.0 xlrd : 2.0.1 zstandard : 0.19.0 tzdata : 2022.7 qtpy : None pyqt5 : None None

Comment From: jbrockmendel

I think it would make sense to infer a Timestamp reso matching "unit" in this case

Comment From: MarcoGorelli

The issue is that when we get here

https://github.com/pandas-dev/pandas/blob/448a023c9182318af8da29a652026defaccc9ef4/pandas/_libs/tslibs/timestamps.pyx#L1637

then we have: - ts_input: -52700112000 - unit: 's'

Then, we get to

https://github.com/pandas-dev/pandas/blob/08a7a9e249094f9f246c03180841210e447c26c4/pandas/_libs/tslibs/conversion.pyx#L291-L302

and go to cast_from_unit, which tries to convert -52700112000 to nanoseconds. Problem is, that's out of bounds for i64

What do you think the solution is? That cast_from_unit not return nanoseconds, but rather the offset from 1970-01-01 in the resolution closest to unit?

Comment From: jbrockmendel

id start with something like

in_reso = abbrev_to_npy_unit(unit)
out_reso = get_supported_reso(in_reso)
value = convert_reso(ts, in_reso, out_reso)

would take some more effort to get this working for floats

Comment From: MarcoGorelli

thanks, that works

looks like I need to set unit though, otherwise Timestamp(20130101) would throw

Traceback (most recent call last):
  File "/home/marcogorelli/pandas-dev/t.py", line 19, in <module>
    Timestamp(20130101)
  File "pandas/_libs/tslibs/timestamps.pyx", line 1637, in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tzobj, unit, 0, 0, nanosecond)
  File "pandas/_libs/tslibs/conversion.pyx", line 302, in pandas._libs.tslibs.conversion.convert_to_tsobject
    value = convert_reso(ts, in_reso, out_reso, False)
  File "pandas/_libs/tslibs/np_datetime.pyx", line 590, in pandas._libs.tslibs.np_datetime.convert_reso
    mult = get_conversion_factor(to_reso, from_reso)
  File "pandas/_libs/tslibs/np_datetime.pyx", line 547, in pandas._libs.tslibs.np_datetime.get_conversion_factor
    raise ValueError("unit-less resolutions are not supported")
ValueError: unit-less resolutions are not supported

would in_reso = abbrev_to_npy_unit(unit or 'ns') be OK?

Comment From: jbrockmendel

would in_reso = abbrev_to_npy_unit(unit or 'ns') be OK?

I think so, yes.

Comment From: MarcoGorelli

would take some more effort to get this working for floats

How do you think this could/should work?

For ints, unit ends up determining the resolution.

But for floats, a more finer resolution might be required - for example, currently, we have

In [15]: Timestamp(1.5, unit='s')
Out[15]: Timestamp('1970-01-01 00:00:01.500000')

but if the resolution were inferred from the unit, we would get

In [13]: Timestamp(1.5, unit='s')
Out[13]: Timestamp('1970-01-01 00:00:01')

And maybe that's OK? Or should non-round floats be disallowed here?

Comment From: jbrockmendel

If we deprecate allowing non-round floats here (and presumably also in to_datetime etc) is there a nice-ish alternative we can suggest? If so I'd be OK with that; I doubt it is used all that often.

The not-angering-anybody solution would be to add a separate keyword like out_unit but that gets ugly.

Comment From: jbrockmendel

im playing catch-up on non-nano issues. did we decide anything here?

Comment From: MarcoGorelli

I don't think there's any decision here, no

is there a nice-ish alternative we can suggest?

all I can think of is to suggest that they add a timedelta with their desired offset? I think it's quite ambiguous what 1.5, unit='s' exactly means anyway, and I'm not sure it should be supported