Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this issue exists on the latest version of pandas.

  • [X] I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

rg = pd.date_range("2020-01-01", periods=100_000, freq="s")

ts_ns = pd.Timestamp("1996-01-01 00:00:00.00000000000")
ts_s = pd.Timestamp("1996-01-01")

Following timings:

%timeit rg < ts_s
2.27 ms ± 44.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rg < ts_ns
108 µs ± 572 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I guess a bunch of users will define timestamps not up to the nanosecond and hence getting mismatched resolutions which causes a really big slowdown. Can we fix this somehow for 2.0?

Time is almost exclusively spent in

{pandas._libs.tslibs.np_datetime.compare_mismatched_resolutions}

cc @jbrockmendel @MarcoGorelli

Installed Versions

main

Prior Performance

No response

Comment From: jbrockmendel

could do a try/except for lossless conversion to shared reso and fall back to compare_mismatches_resolutions

Comment From: MarcoGorelli

thanks for noticing!

tbh I'm a bit surprised that pd.date_range("2020-01-01", periods=100_000, freq="s") isn't of unit 's' - if it was then the performance issue would be addressed (you can try this by passing unit='s' to date_range)

Comment From: phofl

This is just a small reproducer, the initial problem came from parquet files where the timestamps where stored as ns reso

Comment From: MarcoGorelli

could do a try/except for lossless conversion to shared reso and fall back to compare_mismatches_resolutions

is this something you have time to take on?

Comment From: jbrockmendel

I think so yes

Comment From: jbrockmendel

tbh I'm a bit surprised that pd.date_range("2020-01-01", periods=100_000, freq="s") isn't of unit 's'

I considered inferring reso in date_range but it became really messy bc you could have start/end with different resos (which themselves might be inferred or already present in Timestamps).