Is your feature request related to a problem?

I would like to be able to do arithmetic operations on intervals: - a + b: pd.Interval(3, 6) + pd.Interval(5, 7) == pd.Interval(3, 7) - a - b: pd.Interval(3, 6) - pd.Interval(5, 7) == pd.Interval(3, 5) - [a, b].union() / np.sum([a, b]): python intervals = pd.arrays.IntervalArray.from_tuples([(0, 1), (1, 3), (2, 4), (5, 7)]) intervals.sum() == pd.arrays.IntervalArray.from_tuples([(0, 4), (5, 7)])

API breaking implications

None AFAIK, since those methods are not implemented yet

Comment From: venaturum

@Hoeze I developed a package called staircase which allows a lot of manipulations with intervals and is designed to be closely aligned with pandas. You may find it handy? www.staircase.dev

Comment From: mroeschke

Is there a standard or convention for arithmetic with interval types? For example, do Postgres range types operate similarly?

Comment From: Hoeze

@venaturum Thanks for your suggestion. Would you mind posting an example on how to use your library for this use case?

@mroeschke The details about Postgresql's range operators can be found here: https://www.postgresql.org/docs/current/functions-range.html PostgreSQL supports (amongst many others) the following operators: - +: range union - *: range intersection - -: range difference

Further operators are any kinds of overlap or contain operators.

I would not know whether PostgreSQL supports range aggregations, but I think that range union and range intersection across multiple intervals would be really helpful.

Comment From: venaturum

@Hoeze no problem. In the code snippets below I am assuming a notebook format where the last variable is printed to console. A jupyter notebook with these code examples is downloadable here: IntervalsWithStaircase.zip

The solution involves thinking about the problem in terms of step functions. We'll use a one-to-one mapping between sets of disjoint intervals and step functions, where for a set of intervals, the corresponding step function is f(x) = 1 if the point x belongs to one of the intervals and 0 otherwise.

The package represents step functions via a class called Stairs. This class has almost every binary operator you can think of defined, all results of which are Stairs objects themselves.

import pandas as pd
import staircase as sc

a = sc.Stairs(start=3, end=6)
b = sc.Stairs(start=5, end=7)

There are more parameters to the Stairs constructor which dictate "step-size" (default 1) and baseline-value (default 0) but start and end are all that is needed here.

You can use Stairs.plot, or Stairs.to_frame to understand the step function values

b.to_frame()
  start  end  value
0  -inf    5      0
1     5    7      1
2     7  inf      0

The union of intervals (a+b in your example) can be done by using a "logical or" operation:

result = a | b  # result will be a Stairs instance
result.to_frame()
  start  end  value
0  -inf    3    0.0
1     3    7    1.0
2     7  inf    0.0

For set difference (a-b in your example) the phrase "in a, and not in b" can be literally translated with the step functions:

result = a & ~b
result.to_frame()
  start  end  value
0  -inf    3    0.0
1     3    5    1.0
2     5  inf    0.0

The same result can be achieved by using b as a mask (which creates undefined values where b in non-zero) then filling the NAs with 0. It can also be achieved by multiplying instead of using the & operator, i.e.

result = a.mask(b).fillna(0)

or

result = a * ~b

For the union of many intervals, there are two approaches. The first is to create a step function for every interval and repeatedly apply the logical_or function.

from functools import reduce

intervals = pd.arrays.IntervalArray.from_tuples([(0, 1), (1, 3), (2, 4), (5, 7)])
result = reduce(sc.Stairs.logical_or, [sc.Stairs(start=i.left, end=i.right) for i in intervals])
result.to_frame()
  start  end  value
0  -inf    0    0.0
1     0    4    1.0
2     4    5    0.0
3     5    7    1.0
4     7  inf    0.0

The second approach is to add the step functions together (and values where the intervals overlap will have values of 2 or more), and then set non-zero values of the step function to 1. This approach may be favourable as it uses the ability to pass in vectors of start and end values for the intervals into the constructor of a single Stairs object.

result1 = sc.Stairs(start=intervals.left, end=intervals.right)
result1.to_frame()
  start  end  value
0  -inf    0      0
1     0    2      1
2     2    3      2
3     3    4      1
4     4    5      0
5     5    7      1
6     7  inf      0

This is where result1.plot() can come in handy to visualise the step function (requires matplotlib).

From here we can just use a relational operator to get a boolean valued step function:

result2 = result1 > 0
result2.to_frame()
  start  end  value
0  -inf    0      0
1     0    4      1
2     4    5      0
3     5    7      1
4     7  inf      0

We could have also used result1 != 0 or result1.make_boolean() to convert to the binary valued step function.

If you want to convert this step function back to an interval array then it can be done so like this:

intervals2 = (
    result2.to_frame()
    .query('value == 1')
    .pipe(lambda df: pd.arrays.IntervalArray.from_arrays(df.start, df.end))
)
print(intervals2)
<IntervalArray>
[(0, 4], (5, 7]]
Length: 2, dtype: interval[int64, right]

Is interval manipulation the primary use case for staircase? Probably not. Given vectors of "start" and "stop" times it makes quick work of things like queues, and asset utilisations. But maybe it's useful here depending on the context of your problem.

Comment From: venaturum

@Hoeze I've taken the above concepts with staircase and created a new package "piso" for set operations over pandas.Interval, pandas.arrays.IntervalArray and pandas.IntervalIndex.

https://piso.readthedocs.io

It's currently available through pypi. Will continue to expand the functionality over the coming weeks.

Comment From: t0uch9r455

I made a commit that fixes this issue:

https://github.com/clarkwiththew/pandas/commit/6471a095bb77a0a558a6bf7dcd80f7948354f9ab

pandas report.pdf