Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.DataOffset(milliseconds=1, days=1)
Issue Description
TypeError Traceback (most recent call last)
Input In [54], in <cell line: 1>()
----> 1 pd.DateOffset(milliseconds=1, days=1)
File ...pandas/_libs/tslibs/offsets.pyx:1233, in pandas._libs.tslibs.offsets.RelativeDeltaOffset.__init__()
File ...pandas/_libs/tslibs/offsets.pyx:325, in pandas._libs.tslibs.offsets._determine_offset()
TypeError: relativedelta.__init__() got an unexpected keyword argument 'milliseconds'
In contrast, if I pass millisecond
(no plural) I get a more useful error:
NotImplementedError: Using DateOffset to replace `millisecond` component in datetime object is not supported. Use `microsecond=timestamp.microsecond % 1000 + ms * 1000` instead.
Expected Behavior
Since DateOffset
advertises that it takes milliseconds
and millisecond
as keyword arguments, it should be possible to construct a DateOffset
using those arguments in combination with others.
[I would have a go at fixing this, but I cannot make head or tail of the intended logic in _libs.tslibs.offset._determine_offset
. Comments don't appear to match the code behaviour, and I don't know how many moving parts there are here].
Installed Versions
Comment From: markopacak
milliseconds
(plur) is a valid arg for datetime.timedelta
but not for dateutil.relativedelta.relativedelta
.
E.g. timedelta(days=1, milliseconds=1)
executes normally: so, maybe the expected behavior is to have timedelta
handle milliseconds (plur).
However, the if-condition at line 315 catches anything that has days
in kwds and forwards it to relativedelta
, hence causing and Exception (line 323).
https://github.com/pandas-dev/pandas/blob/dc5e00d91dd515e7aeff1b84967afa0b4d39adca/pandas/_libs/tslibs/offsets.pyx#L298-L334
Perhaps, adding a check for milliseconds
(plur) right before, is the right direction to take. Something like
if "milliseconds" in kwds_no_nanos: # plural
offset = timedelta(**kwds_no_nanos)
at line 316.
relativedelta
does not support millisecond
(sing) either, and we have a special check in place for it, so maybe we can place one for milliseconds
(plur) too.
Comment From: mroeschke
Yeah maybe if millisecond or milliseconds (singular or plural) are always provided DateOffset
should just always used the non-relativedelta internal configuration.
Comment From: markopacak
take
Comment From: markopacak
millsecond
(singular) is not a valid argument for datetime.timedelta
(nor for relativedelta).
This is handled by raising
NotImplementedError(
"Using DateOffset to replace `millisecond` component in "
"datetime object is not supported. Use "
"`microsecond=timestamp.microsecond % 1000 + ms * 1000` "
"instead."
)
Maybe, that's a conversion we could be doing ourselves instead of throwing an error.
In general, this function can be rewritten in a much better way, but don't know if it should be done in this PR.
Comment From: markopacak
I added this fix for now.
In the meantime, I'm testing a reworked implementation of _determine_offset
separately.
Comment From: markopacak
Here is my initial approach to the rework I had in mind.
I only have 2 tests failing. Haven't run all tests suites yet (only test_offsets.py
and test_datetime64.py
), but I think I'm close.
IMO, is much more readable then before. I will probably open a separate issue for this soon. Any thoughts?
https://github.com/pandas-dev/pandas/blob/6b2a7e054409909cd6072f01ca0d138985b09a4d/pandas/_libs/tslibs/offsets.pyx#L300-L364
Comment From: wence-
How about:
def determine_offset(kwargs):
if not kwargs:
return timedelta(days=1), False
nanos = {"nanosecond", "nanoseconds"}
kwargs_no_nano = {k: v for k, v in kwargs.items() if k not in nanos}
kwargs_no_nano["microsecond"] += kwargs_no_nano.pop("millisecond", 0) * 1000
use_relativedelta = {
"year",
"month",
"day",
"hour",
"minute",
"second",
"microsecond",
"weekday",
"years",
"months",
"weeks",
"days",
"hours",
"minutes",
"seconds",
"microseconds",
}
if all(k in use_relativedelta for k in kwargs_no_nano):
return relativedelta(**kwargs_no_nano), True
else:
return timedelta(**kwargs_no_nano), False
Minor changes: use early returns in all cases (which removes the need for the use_relativedelta
flag); handle the singular "millisecond" case (correctly, unlike the message in the NotImplementedError
); no need for the special-casing of only nanosecond kwargs, since both timedelta(**{})
and relativedelta(**{})
DTRT