Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
a = [ 10, 15, 20, 25, 30, 35, 40, 45, 45, 50, 55, 60, 65, 70, 75, 80, 85 ]
b = [ 15, 20, 25, 30, 35, 40, 45, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90 ]
# ^^^^^^^^^^ (overlapping region)
idx_1 = pd.IntervalIndex.from_arrays(a, b)
print(idx_1.is_overlapping)
# prints True
idx_2 = pd.IntervalIndex.from_arrays(a[1:], b[1:])
print(idx_2.is_overlapping)
# prints False (INCORRECT)
idx_3 = pd.IntervalIndex.from_arrays(a[:-1], b[:-1])
print(idx_3.is_overlapping)
# prints False (INCORRECT)
Issue Description
IntervalIndex.is_overlapping
is returning incorrect values.
Notice how for the full a
/b
arrays it returns that there are overlapping intervals (they are in the middle), but if you remove the first/last pair ([1:], [-1:]
), it returns incorrectly that there are no overlapping intervals.
Expected Behavior
Correct value
Installed Versions
Comment From: Hikipeko
take
Comment From: Hikipeko
Hi, @2-5 thanks for the bug report.
I guess that the overlapping region doesn't exist since the from_arrays
function return intervals that are open on the right and closed on the left. So the interval (45, 45]
is an empty set and won't cause a problem.
However, if the above is true, idx_1.is_overlapping
should return False instead of True. This should still be a bug.
import pandas as pd
a = [ 10, 15, 20, 25, 30, 35, 40, 45, 45, 50, 55, 60, 65, 70, 75, 80, 85 ]
b = [ 15, 20, 25, 30, 35, 40, 45, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90 ]
# ^^^^^^^^^^ (not overlapping)
idx_1 = pd.IntervalIndex.from_arrays(a, b)
print(idx_1.is_overlapping)
# prints True (INCORRECT)
idx_2 = pd.IntervalIndex.from_arrays(a[1:], b[1:])
print(idx_2.is_overlapping)
# prints False
idx_3 = pd.IntervalIndex.from_arrays(a[:-1], b[:-1])
print(idx_3.is_overlapping)
# prints False
Comment From: 2-5
I wasn't sure what was the correct behavior. I only encountered the issue when having consecutive empty intervals - (45, 45], (45, 45]
, so I assumed that IntervalIndex
doesn't allow duplication at all. If there is a single empty interval, the bug doesn't seem to occur.
I encountered the problem by passing a similar pd.IntervalIndex
to pd.cut(..., bins=idx_1)
, which then raises an exception that overlapping intervals are not allowed.
Comment From: SlowMo24
I had a similar issue where is_overlapping
returned False
but I could not drop a line from the DataFrame
because it refused to drop with an overlapping index
. The result of is_overlapping
was random in that case: an IntervalIndex
would return False
but if recreated (with the same data except) with one element removed, it would return True
.
I was able to create an small issue worthy example DataFrame with version 1.5.3 (current) but that example then no longer worked with the current main (although my issue persists there), which is required when filing a bug.
So my workaround is to pad these zero-length intervals by 1 millisecond (timedelta + timedelta(microseconds=1)
) so the get >0 length (because I didn't want to delete them).
Comment From: jbrockmendel
I was able to create an small issue worthy example DataFrame with version 1.5.3 (current) but that example then no longer worked with the current main (although my issue persists there), which is required when filing a bug.
When you say "no longer working" do you mean the bug is fixed on main? If not, go ahead and open an issue anyway.
Comment From: SlowMo24
Thanks for reaching out. I looked deeper into it and this bug is in fact fixed in the main branch. Yet, another one came up and I have opened #52245 .