Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# OK:
n, val = 127, pd.NA
idx = pd.Index(range(n), dtype="Int64").union(pd.Index([val], dtype="Int64"))
s = pd.Series(index=idx, data=range(n+1), dtype="Int64")
s.drop(0)
# Still OK:
n, val = 128, 128
idx = pd.Index(range(n), dtype="Int64").union(pd.Index([val], dtype="Int64"))
s = pd.Series(index=idx, data=range(n+1), dtype="Int64")
s.drop(0)
# But this FAILS:
n, val = 128, pd.NA
idx = pd.Index(range(n), dtype="Int64").union(pd.Index([val], dtype="Int64"))
s = pd.Series(index=idx, data=range(n+1), dtype="Int64")
s.drop(0) # ValueError: 'indices' contains values less than allowed (-128 < -1)
# Expected no error
WORKAROUND. to filter out elements, use a boolean mask/indexing instead of s.drop():
s[~s.index.isin([0])]
Issue Description
When NA
is present in Index
and the length of the Index exceeds 128, it behaves in a completely weird way.
This bug can be narrowed down to IndexEngine.get_indexer()
or MaskedIndexEngine.get_indexer()
, as these examples suggest:
axis = pd.Index(range(250), dtype='Int64').union(pd.Index([pd.NA], dtype='Int64'))
new_axis = axis.drop(0)
axis.get_indexer(new_axis)[-5:] # array([246, 247, 248, 249, -6])
# Expected array([246, 247, 248, 249, 250])
axis = pd.Index(range(254), dtype='Int64').union(pd.Index([pd.NA], dtype='Int64'))
new_axis = axis.drop(0)
axis.get_indexer(new_axis)[-5:] # array([250, 251, 252, 253, -2])
# Expected array([250, 251, 252, 253, 254])
These examples further suggest that the root cause of the bug is in how NaN
is represented in and is interacting with the hash tables that Index
uses for its _engine
.
Expected Behavior
See above
Installed Versions
Comment From: avm19
take
Comment From: avm19
The bug is likely due to this line: https://github.com/pandas-dev/pandas/blob/3979e954a339db9fc5e99b72ccb5ceda081c33e5/pandas/_libs/hashtable_class_helper.pxi.in#L538
Should be {{c_type}} na_position = self.na_position
. Will confirm soon.