Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
In [5]: ts = Timestamp('1900-01-01')
In [6]: Timestamp(ts.value, unit=ts.unit)
Out[6]: Timestamp('1900-01-01 00:00:00')
In [7]: ts = Timestamp('0300-01-01')
In [8]: Timestamp(ts.value, unit=ts.unit)
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
File ~/pandas-dev/pandas/_libs/tslibs/conversion.pyx:141, in pandas._libs.tslibs.conversion.cast_from_unit()
140 try:
--> 141 return <int64_t>(base * m) + <int64_t>(frac * m)
142 except OverflowError as err:
OverflowError: Python int too large to convert to C long
The above exception was the direct cause of the following exception:
OutOfBoundsDatetime Traceback (most recent call last)
Cell In[8], line 1
----> 1 Timestamp(ts.value, unit=ts.unit)
File ~/pandas-dev/pandas/_libs/tslibs/timestamps.pyx:1637, in pandas._libs.tslibs.timestamps.Timestamp.__new__()
1635 raise ValueError("nanosecond must be in 0..999")
1636
-> 1637 ts = convert_to_tsobject(ts_input, tzobj, unit, 0, 0, nanosecond)
1638
1639 if ts.value == NPY_NAT:
File ~/pandas-dev/pandas/_libs/tslibs/conversion.pyx:300, in pandas._libs.tslibs.conversion.convert_to_tsobject()
298 obj.value = NPY_NAT
299 else:
--> 300 ts = cast_from_unit(ts, unit)
301 obj.value = ts
302 pandas_datetime_to_datetimestruct(ts, NPY_FR_ns, &obj.dts)
File ~/pandas-dev/pandas/_libs/tslibs/conversion.pyx:143, in pandas._libs.tslibs.conversion.cast_from_unit()
141 return <int64_t>(base * m) + <int64_t>(frac * m)
142 except OverflowError as err:
--> 143 raise OutOfBoundsDatetime(
144 f"cannot convert input {ts} with the unit '{unit}'"
145 ) from err
OutOfBoundsDatetime: cannot convert input -52700112000 with the unit 's'
Issue Description
The above throws an error when the timestamp is out-of-nano-bounds
Noticed while trying to address #51024, in
https://github.com/pandas-dev/pandas/blob/a82b8e2073ba3a3bdbfaa3c54c46375d6dd09977/pandas/core/resample.py#L2160-L2161
Expected Behavior
Timestamp('0300-01-01')
Installed Versions
Comment From: jbrockmendel
I think it would make sense to infer a Timestamp reso matching "unit" in this case
Comment From: MarcoGorelli
The issue is that when we get here
https://github.com/pandas-dev/pandas/blob/448a023c9182318af8da29a652026defaccc9ef4/pandas/_libs/tslibs/timestamps.pyx#L1637
then we have:
- ts_input
: -52700112000
- unit
: 's'
Then, we get to
https://github.com/pandas-dev/pandas/blob/08a7a9e249094f9f246c03180841210e447c26c4/pandas/_libs/tslibs/conversion.pyx#L291-L302
and go to cast_from_unit
, which tries to convert -52700112000
to nanoseconds. Problem is, that's out of bounds for i64
What do you think the solution is? That cast_from_unit
not return nanoseconds, but rather the offset from 1970-01-01 in the resolution closest to unit
?
Comment From: jbrockmendel
id start with something like
in_reso = abbrev_to_npy_unit(unit)
out_reso = get_supported_reso(in_reso)
value = convert_reso(ts, in_reso, out_reso)
would take some more effort to get this working for floats
Comment From: MarcoGorelli
thanks, that works
looks like I need to set unit
though, otherwise Timestamp(20130101)
would throw
Traceback (most recent call last):
File "/home/marcogorelli/pandas-dev/t.py", line 19, in <module>
Timestamp(20130101)
File "pandas/_libs/tslibs/timestamps.pyx", line 1637, in pandas._libs.tslibs.timestamps.Timestamp.__new__
ts = convert_to_tsobject(ts_input, tzobj, unit, 0, 0, nanosecond)
File "pandas/_libs/tslibs/conversion.pyx", line 302, in pandas._libs.tslibs.conversion.convert_to_tsobject
value = convert_reso(ts, in_reso, out_reso, False)
File "pandas/_libs/tslibs/np_datetime.pyx", line 590, in pandas._libs.tslibs.np_datetime.convert_reso
mult = get_conversion_factor(to_reso, from_reso)
File "pandas/_libs/tslibs/np_datetime.pyx", line 547, in pandas._libs.tslibs.np_datetime.get_conversion_factor
raise ValueError("unit-less resolutions are not supported")
ValueError: unit-less resolutions are not supported
would in_reso = abbrev_to_npy_unit(unit or 'ns')
be OK?
Comment From: jbrockmendel
would in_reso = abbrev_to_npy_unit(unit or 'ns') be OK?
I think so, yes.
Comment From: MarcoGorelli
would take some more effort to get this working for floats
How do you think this could/should work?
For ints, unit
ends up determining the resolution.
But for floats, a more finer resolution might be required - for example, currently, we have
In [15]: Timestamp(1.5, unit='s')
Out[15]: Timestamp('1970-01-01 00:00:01.500000')
but if the resolution were inferred from the unit, we would get
In [13]: Timestamp(1.5, unit='s')
Out[13]: Timestamp('1970-01-01 00:00:01')
And maybe that's OK? Or should non-round floats be disallowed here?
Comment From: jbrockmendel
If we deprecate allowing non-round floats here (and presumably also in to_datetime etc) is there a nice-ish alternative we can suggest? If so I'd be OK with that; I doubt it is used all that often.
The not-angering-anybody solution would be to add a separate keyword like out_unit but that gets ugly.
Comment From: jbrockmendel
im playing catch-up on non-nano issues. did we decide anything here?
Comment From: MarcoGorelli
I don't think there's any decision here, no
is there a nice-ish alternative we can suggest?
all I can think of is to suggest that they add a timedelta
with their desired offset?
I think it's quite ambiguous what 1.5, unit='s'
exactly means anyway, and I'm not sure it should be supported