Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this issue exists on the latest version of pandas.
-
[X] I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
rg = pd.date_range("2020-01-01", periods=100_000, freq="s")
ts_ns = pd.Timestamp("1996-01-01 00:00:00.00000000000")
ts_s = pd.Timestamp("1996-01-01")
Following timings:
%timeit rg < ts_s
2.27 ms ± 44.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rg < ts_ns
108 µs ± 572 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
I guess a bunch of users will define timestamps not up to the nanosecond and hence getting mismatched resolutions which causes a really big slowdown. Can we fix this somehow for 2.0?
Time is almost exclusively spent in
{pandas._libs.tslibs.np_datetime.compare_mismatched_resolutions}
cc @jbrockmendel @MarcoGorelli
Installed Versions
Prior Performance
No response
Comment From: jbrockmendel
could do a try/except for lossless conversion to shared reso and fall back to compare_mismatches_resolutions
Comment From: MarcoGorelli
thanks for noticing!
tbh I'm a bit surprised that pd.date_range("2020-01-01", periods=100_000, freq="s")
isn't of unit 's'
- if it was then the performance issue would be addressed (you can try this by passing unit='s'
to date_range
)
Comment From: phofl
This is just a small reproducer, the initial problem came from parquet files where the timestamps where stored as ns reso
Comment From: MarcoGorelli
could do a try/except for lossless conversion to shared reso and fall back to compare_mismatches_resolutions
is this something you have time to take on?
Comment From: jbrockmendel
I think so yes
Comment From: jbrockmendel
tbh I'm a bit surprised that pd.date_range("2020-01-01", periods=100_000, freq="s") isn't of unit 's'
I considered inferring reso in date_range but it became really messy bc you could have start/end with different resos (which themselves might be inferred or already present in Timestamps).