Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
In [1]: import pandas as pd
In [2]: pdf = pd.DataFrame({"a": [32424324, None, None], "b": [24242342, 3432434234, 23434234]})
In [3]: pdf = pdf.astype("datetime64[s]")
In [4]: pdf
Out[4]:
a b
0 1971-01-11 06:45:24 1970-10-08 13:59:02
1 NaT 2078-10-08 05:57:14
2 NaT 1970-09-29 05:30:34
In [5]: pdf.dtypes
Out[5]:
a datetime64[s]
b datetime64[s]
dtype: object
In [6]: pdf['a'] - pdf['b']
Out[6]:
0 94 days 16:46:22
1 NaT
2 NaT
dtype: timedelta64[s]
In [7]: def udf(row):
...: return operator.sub(row['a'], row['b'])
...:
In [8]: import operator
In [10]: result = pdf.apply(udf, axis=1)
In [11]: result
Out[11]:
0 94 days 16:46:22
1 NaT
2 NaT
dtype: timedelta64[ns] # BUG: Should be `timedelta64[s]`, to be consistent with the rest of the calls
In [13]: operator.sub(pdf['a'], pdf['b'])
Out[13]:
0 94 days 16:46:22
1 NaT
2 NaT
dtype: timedelta64[s]
In [14]: pdf['a'] - pdf['b']
Out[14]:
0 94 days 16:46:22
1 NaT
2 NaT
dtype: timedelta64[s]
In [15]: pdf['a'][0] - pdf['b'][0]
Out[15]: Timedelta('94 days 16:46:22')
In [18]: (pdf['a'][0] - pdf['b'][0]).unit
Out[18]: 's'
Issue Description
A binop, being performed as a UDF seems to be always returning ns
dtype as opposed to the correct dtype(s
in the above case).
Expected Behavior
In [14]: pdf.apply(udf, axis=1)
Out[14]:
0 94 days 16:46:22
1 NaT
2 NaT
dtype: timedelta64[s]
Installed Versions
Comment From: mroeschke
It appears the core issue here is that apply returns scalars internally, and when constructing the result, then scalar units is not taken into account. Similar to:
In [6]: ser = pd.Series([pd.Timedelta(1).as_unit("s")])
In [7]: ser
Out[7]:
0 0 days
dtype: timedelta64[ns]
@jbrockmendel should the above be able to preserve the "s"
unit from the scalar and return timedelta64[s]
type?
Comment From: jbrockmendel
Yes, for the pd.Series constructor example we need to get unit inference working in array_to_datetime/maybe_convert_objects/infer_dtype. The UDF case might be handle-able independently inside maybe_cast_pointwise_result (which im guessing the OP case goes through, haven't checked)
Comment From: mroeschke
The OP case goes through FrameApply.apply_series_generator
to compose a dict of results and then passed FrameApply.wrap_results
to construct a Series from the dict result, so I think unit inference would be the way to solve this case
Comment From: luke396
take
Comment From: mroeschke
Looks for this case maybe_convert_objects
needs to be patched.
A few open questions:
- Should
Series([unsupported_unit]))
return the closest supported unit (second) for day and lower resolutions and raise for picosecond and higher resolutions? - Should
Series([supported_unit1, supported_unit2])
resolve to the highest supported resolution or raise?
Comment From: luke396
I've been working on this for a while now, but unfortunately, I haven't made much progress yet. (This is more complex than I thought, time and it's format seems have a lot significant issues will fix in the future.)
On a positive note, I did find something interesting while working on this.
pd.Timedelta(1).as_unit('s').unit
Out[4]: 's'
pd.Timedelta(1, unit="s").unit
Out[5]: 'ns'
The question that arises is whether this is really what we want, or if it's a part of the issue we mentioned earlier?