Pandas Series.rmod called with scalar returns inconsistent results based on size of series

Code Sample, a copy-pastable example if possible

s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))

Out

   long_series  short_series
0           -1             1
1           -1             1
2           -1             1
3           -1             1
4           -1             1
5           -1             1
6           -1             1
7           -1             1
8           -1             1
9           -1             1

Problem description

Series.rmod called on a scalar behaves differently depending on the size of the series. On my machine: * length <= 10,000 sign matches divisor (matches pythons % operator) * length > 10,000 sign matches dividend

Appears to behave properly when called on other Series. Series.mod does not appear to have any such issues.

Expected Output

   long_series  short_series
0           1             1
1           1             1
2           1             1
3           1             1
4           1             1
5           1             1
6           1             1
7           1             1
8           1             1
9           1             1

Output of `pd.show_versions()`

[paste the output of ``pd.show_versions()`` here below this line] INSTALLED VERSIONS ------------------ commit : None python : 3.6.7.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 0.25.3 numpy : 1.17.3 pytz : 2019.3 dateutil : 2.8.1 pip : 19.1 setuptools : 41.0.1 Cython : 0.29.14 pytest : 5.0.1 hypothesis : None sphinx : 2.2.1 blosc : None feather : None xlsxwriter : 1.2.2 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : 2.7.3.2 (dt dec pq3 ext lo64) jinja2 : 2.10.3 IPython : 7.9.0 pandas_datareader: None bs4 : 4.8.1 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 3.1.2 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.3.1 sqlalchemy : 1.3.10 tables : 3.6.1 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.2

Comment From: TomAugspurger

Seems to be fixed on master.

s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))

## -- End pasted text --
   long_series  short_series
0            1             1
1            1             1
2            1             1
3            1             1
4            1             1
5            1             1
6            1             1
7            1             1
8            1             1
9            1             1

@kmacdough can you try with master?

Comment From: simonjayhawkins

Seems to be fixed on master.

i'm getting the same issue on master (and also getting the incorrect behaviour on '0.26.0.dev0+910.g71b78685a' (master on 14/11/2019))

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1068.g49bc8d8c9'
>>>
>>> s1 = pd.Series([2] * 10001, name="long_series").rmod(-1)
>>> s2 = pd.Series([2] * 10000, name="short_series").rmod(-1)
>>> print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
   long_series  short_series
0           -1             1
1           -1             1
2           -1             1
3           -1             1
4           -1             1
5           -1             1
6           -1             1
7           -1             1
8           -1             1
9           -1             1

INSTALLED VERSIONS

------------------ commit : 49bc8d8c9d7da5d0fc3da81ff671dfbd35712605 python : 3.7.6.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.18362 machine : AMD64 processor : Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : None.None pandas : 1.1.0.dev0+1068.g49bc8d8c9 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.0.0.post20200311 Cython : 0.29.15 pytest : 5.4.1 hypothesis : 5.6.0 sphinx : 2.4.4 blosc : None feather : None xlsxwriter : 1.2.8 lxml.etree : 4.5.0 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.13.0 pandas_datareader: None bs4 : 4.8.2 bottleneck : 1.3.2 fastparquet : 0.3.3 gcsfs : None matplotlib : 3.2.0 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.1 pandas_gbq : None pyarrow : 0.16.0 pytables : None pyxlsb : None s3fs : 0.4.0 scipy : 1.3.1 sqlalchemy : 1.3.15 tables : 3.6.1 tabulate : 0.8.6 xarray : 0.15.0 xlrd : 1.2.0 xlwt : 1.3.0 numba : 0.48.0 >>>

Comment From: simonjayhawkins

with uint64 casts to float, Int64 works as expected and UInt64 raises TypeError

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1068.g49bc8d8c9'
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("int64").rmod(-1)
0       -1
1       -1
2       -1
3       -1
4       -1
        ..
9996    -1
9997    -1
9998    -1
9999    -1
10000   -1
Name: long_series, Length: 10001, dtype: int64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("uint64").rmod(-1)
0        1.0
1        1.0
2        1.0
3        1.0
4        1.0
        ...
9996     1.0
9997     1.0
9998     1.0
9999     1.0
10000    1.0
Name: long_series, Length: 10001, dtype: float64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("Int64").rmod(-1)
0        1
1        1
2        1
3        1
4        1
        ..
9996     1
9997     1
9998     1
9999     1
10000    1
Name: long_series, Length: 10001, dtype: Int64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("UInt64").rmod(-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\ops\__init__.py", line 518, in flex_wrapper
    return op(self, other)
  File "C:\Users\simon\pandas\pandas\core\ops\roperator.py", line 40, in rmod
    return right % left
  File "C:\Users\simon\pandas\pandas\core\ops\common.py", line 63, in new_method
    return method(self, other)
  File "C:\Users\simon\pandas\pandas\core\ops\__init__.py", line 440, in wrapper
    result = arithmetic_op(lvalues, rvalues, op, str_rep)
  File "C:\Users\simon\pandas\pandas\core\ops\array_ops.py", line 205, in arithmetic_op
    res_values = op(lvalues, rvalues)
  File "C:\Users\simon\pandas\pandas\core\ops\roperator.py", line 40, in rmod
    return right % left
  File "C:\Users\simon\pandas\pandas\core\ops\common.py", line 63, in new_method
    return method(self, other)
  File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 679, in integer_arithmetic_method
    return self._maybe_mask_result(result, mask, other, op_name)
  File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 608, in _maybe_mask_result
    return type(self)(result, mask, copy=False)
  File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 348, in __init__
    "values should be integer numpy array. Use "
TypeError: values should be integer numpy array. Use the 'integer_array' function instead
>>>

Comment From: jbrockmendel

@simonjayhawkins didnt you have a recent numexpr-related PR on a issue similar to this?

Comment From: mroeschke

I'm getting the correct behavior on master again. Maybe worth having a test to see if it's not platform or package related

In [3]: s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
   ...: s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
   ...: print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
   long_series  short_series
0            1             1
1            1             1
2            1             1
3            1             1
4            1             1
5            1             1
6            1             1
7            1             1
8            1             1
9            1             1

Comment From: codamuse

@phofl not sure if I should be pinging you on this but I have a test written that's pretty straightforward:

    result = pd.Series([2] * 10001).rmod(-1)
    expected = pd.Series([1] * 10001)

    tm.assert_series_equal(result, expected)

two questions: 1) does the test look okay, 2) I'm trying to test against 1.1.0 to confirm it fails (it passes on the latest main) but having some issues building.

To test against 1.1.0 I'm checking out d9fff2792bf16178d4e450fe7384244e50635733 and then when running python setup.py build_ext -j 4 I get the following:

pandas/_libs/writers.c:5092:5: error: ‘_PyUnicode_get_wstr_length’ is deprecated [-Werror=deprecated-declarations]

Any pointers? Thanks!

Comment From: phofl

Test looks ok I think,

You don't have to build from source there. You can simply create another environment and install the released pandas 1.1.0