Code Sample, a copy-pastable example if possible
In
s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
Out
long_series short_series
0 -1 1
1 -1 1
2 -1 1
3 -1 1
4 -1 1
5 -1 1
6 -1 1
7 -1 1
8 -1 1
9 -1 1
Problem description
Series.rmod
called on a scalar behaves differently depending on the size of the series.
On my machine:
* length <= 10,000 sign matches divisor (matches pythons %
operator)
* length > 10,000 sign matches dividend
Appears to behave properly when called on other Series.
Series.mod
does not appear to have any such issues.
Expected Output
long_series short_series
0 1 1
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
Output of pd.show_versions()
Comment From: TomAugspurger
Seems to be fixed on master.
s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
## -- End pasted text --
long_series short_series
0 1 1
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
@kmacdough can you try with master?
Comment From: simonjayhawkins
Seems to be fixed on master.
i'm getting the same issue on master (and also getting the incorrect behaviour on '0.26.0.dev0+910.g71b78685a' (master on 14/11/2019))
>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1068.g49bc8d8c9'
>>>
>>> s1 = pd.Series([2] * 10001, name="long_series").rmod(-1)
>>> s2 = pd.Series([2] * 10000, name="short_series").rmod(-1)
>>> print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
long_series short_series
0 -1 1
1 -1 1
2 -1 1
3 -1 1
4 -1 1
5 -1 1
6 -1 1
7 -1 1
8 -1 1
9 -1 1
INSTALLED VERSIONS
Comment From: simonjayhawkins
with uint64 casts to float, Int64 works as expected and UInt64 raises TypeError
>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1068.g49bc8d8c9'
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("int64").rmod(-1)
0 -1
1 -1
2 -1
3 -1
4 -1
..
9996 -1
9997 -1
9998 -1
9999 -1
10000 -1
Name: long_series, Length: 10001, dtype: int64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("uint64").rmod(-1)
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
...
9996 1.0
9997 1.0
9998 1.0
9999 1.0
10000 1.0
Name: long_series, Length: 10001, dtype: float64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("Int64").rmod(-1)
0 1
1 1
2 1
3 1
4 1
..
9996 1
9997 1
9998 1
9999 1
10000 1
Name: long_series, Length: 10001, dtype: Int64
>>>
>>> pd.Series([2] * 10001, name="long_series").astype("UInt64").rmod(-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\simon\pandas\pandas\core\ops\__init__.py", line 518, in flex_wrapper
return op(self, other)
File "C:\Users\simon\pandas\pandas\core\ops\roperator.py", line 40, in rmod
return right % left
File "C:\Users\simon\pandas\pandas\core\ops\common.py", line 63, in new_method
return method(self, other)
File "C:\Users\simon\pandas\pandas\core\ops\__init__.py", line 440, in wrapper
result = arithmetic_op(lvalues, rvalues, op, str_rep)
File "C:\Users\simon\pandas\pandas\core\ops\array_ops.py", line 205, in arithmetic_op
res_values = op(lvalues, rvalues)
File "C:\Users\simon\pandas\pandas\core\ops\roperator.py", line 40, in rmod
return right % left
File "C:\Users\simon\pandas\pandas\core\ops\common.py", line 63, in new_method
return method(self, other)
File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 679, in integer_arithmetic_method
return self._maybe_mask_result(result, mask, other, op_name)
File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 608, in _maybe_mask_result
return type(self)(result, mask, copy=False)
File "C:\Users\simon\pandas\pandas\core\arrays\integer.py", line 348, in __init__
"values should be integer numpy array. Use "
TypeError: values should be integer numpy array. Use the 'integer_array' function instead
>>>
Comment From: jbrockmendel
@simonjayhawkins didnt you have a recent numexpr-related PR on a issue similar to this?
Comment From: mroeschke
I'm getting the correct behavior on master again. Maybe worth having a test to see if it's not platform or package related
In [3]: s1 = pd.Series([2] * 10001, name='long_series').rmod(-1)
...: s2 = pd.Series([2] * 10000, name='short_series').rmod(-1)
...: print(s1.loc[0:9].to_frame().join(s2.loc[0:9]))
long_series short_series
0 1 1
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
Comment From: codamuse
@phofl not sure if I should be pinging you on this but I have a test written that's pretty straightforward:
result = pd.Series([2] * 10001).rmod(-1)
expected = pd.Series([1] * 10001)
tm.assert_series_equal(result, expected)
two questions: 1) does the test look okay, 2) I'm trying to test against 1.1.0 to confirm it fails (it passes on the latest main) but having some issues building.
To test against 1.1.0 I'm checking out d9fff2792bf16178d4e450fe7384244e50635733
and then when running python setup.py build_ext -j 4
I get the following:
pandas/_libs/writers.c:5092:5: error: ‘_PyUnicode_get_wstr_length’ is deprecated [-Werror=deprecated-declarations]
Any pointers? Thanks!
Comment From: phofl
Test looks ok I think,
You don't have to build from source there. You can simply create another environment and install the released pandas 1.1.0
Comment From: jbrockmendel
in tests.arithmetic.switch_numexpr_min_elements we change the numexpr cutoff so that we shouldn't need a long Series to test this