Master:

In [14]: idx = pd.interval_range(0, 1000, 1000) 

In [15]: %timeit getattr(idx, '_ndarray_values', idx)
1.29 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [16]: %timeit idx.closed
321 ns ± 2.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

while on 0.25.3:

In [13]: idx = pd.interval_range(0, 1000, 1000) 

In [14]: %timeit getattr(idx, '_ndarray_values', idx)  
90.5 ns ± 2.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [15]: %timeit idx.closed 
105 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

(just checked a few attributes, didn't check if it is related to those specific ones or getattr in general)

I think this is a cause / one of the causes of several regressions that can currently be seen at https://pandas.pydata.org/speed/pandas/ (eg https://pandas.pydata.org/speed/pandas/#reshape.Cut.time_cut_timedelta?p-bins=1000&commits=6efc2379-b9de33e3)

Comment From: jorisvandenbossche

cc @jbrockmendel this might be related: https://github.com/pandas-dev/pandas/pull/30605

Comment From: jbrockmendel

30605 changed _ndarray_values from a cache_readonly to a property. Easy to update.

Comment From: jorisvandenbossche

That might be easy to update, but I am not sure that this explains the slowdown in the other attributes / slowdown in dependent functionality? Eg pd.cut (the actual benchmark I was investigating):

bins = 1000 
N = 10 ** 5
timedelta_series = pd.Series(np.random.randint(N, size=N), dtype="timedelta64[ns]")
pd.cut(timedelta_series, bins)

This pd.cut call also shows a slowdown, so would need to check if the caching improves that again (note, I also didn't fully investigate if the slowdown here is caused by the IntervalIndex attributes slowdown, there might also be other reasons).

Comment From: TomAugspurger

On master right now, CategoricalIndex._ndarray_values is using Index._ndarray_values, which is something like 6x slower than the old implementation (just an attribute lookup).

https://github.com/pandas-dev/pandas/pull/30717/files is adding an ExtensionIndex._ndarray_values which would I think fix at least one source of the slowdown. Or we can have a targeted fix restoring CategoricalIndex._ndarray_values as _data.values.

Comment From: jorisvandenbossche

Reopening this because the other attributes are still slower (the _ndarray_values was solved in #30797), eg the original example in the top post for idx.closes is still 3x slower.

Comment From: jorisvandenbossche

The original cut benchmark (see example in https://github.com/pandas-dev/pandas/issues/30742#issuecomment-571199909) seems to be fixed with #30797!

Comment From: jreback

The original cut benchmark (see example in #30742 (comment)) seems to be fixed with #30797!

is this fixed @jorisvandenbossche

Comment From: jorisvandenbossche

Not in general, only the specific case of the _ndarray_values attribute

Comment From: TomAugspurger

Any suggestions for the .closed regression?

IIUC, the additional overhead comes from going through accessor.PandasDelegate. Some options

  1. Just manually write the definition of IntervalIndex.closed
  2. Make IntervalIndex.closed cache readonly
  3. Some simpler mixin, such that the definition of IntervalIndex.closed becomes just ._data.closed, rather than the dynamic _delegate_property_get?

Comment From: jbrockmendel

indexes.extension.inherit_names has a cache kwarg to turn a property into a cache_readonly. I can update #30860 to make closed (and anything else?) cached.

Comment From: TomAugspurger

I think just closed IntervalIndex.

Comment From: jorisvandenbossche

I don't think it's just closed, it's all attributes that use the same mechanism.

I think we could also have a version of PandasDelagate that is specific for the indexes that have _data as the underlying data as extension array. I think that way, it can be written more specific / more efficient.

Comment From: TomAugspurger

@jbrockmendel does https://github.com/pandas-dev/pandas/pull/30860 close this, or is the caching being done in a separate PR?

Or do we want a PandasDelegate subclass that's specific to cases where Index.attr is just Index._data.attr?

Comment From: jorisvandenbossche

BTW, I don't think this should necessarily be a blocker for 1.0. But we should keep this in mind for all index->EA delegation refactor that is happening, and look into improving this.

Comment From: jbrockmendel

30860 did the caching for closed but there may be other properties that can be made cached

Comment From: simonjayhawkins

moved off 1.1.1 milestone (scheduled for this week) as no PRs to fix in the pipeline

Comment From: simonjayhawkins

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

Comment From: simonjayhawkins

moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline

Comment From: simonjayhawkins

moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline

Comment From: jbrockmendel

Closing as fixed