Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
pd.Series([1, pd.NA, 2]) > 0
Out[24]:
0 True
1 False
2 True
dtype: bool
pd.Series([1, pd.NA, 2], dtype='int32[pyarrow]') > 0
Out[25]:
0 True
1 <NA>
2 True
dtype: bool[pyarrow]
Issue Description
pyarrow backed series might contains <NA>
after comparison. When used in loc
this will crash python.
Expected Behavior
All values are boolean.
Installed Versions
Comment From: mroeschke
pd.Series([1, pd.NA, 2])
is object
type which has different comparison semantics. The pyarrow behavior mirrors pandas nullable dtype implementation which propogates NA
In [3]: pd.Series([1, pd.NA, 2], dtype="Int32") > 0
Out[3]:
0 True
1 <NA>
2 True
dtype: boolean
Closing as the expected behavior
Comment From: char101
Sorry, my sample code is lacking, the real problem is with this example code:
s = pd.Series(range(0, 2700), dtype='int64[pyarrow]')
s.loc[s.diff() < 0] # crash
s.loc[(s.diff() < 0).fillna(False)] # don't crash
In my environment, the first two lines will crash python. Is this a bug or something else?
Comment From: char101
It seems that values beyond 515 will crash python:
s = pd.Series(range(0, 515), dtype='int64[pyarrow]')
s.loc[s.diff() < 0] # don't crash
s = pd.Series(range(0, 516), dtype='int64[pyarrow]')
s.loc[s.diff() < 0] # crash
Comment From: mroeschke
None of those examples crash on my machine, but might be related to https://github.com/pandas-dev/pandas/issues/52059