This only happens with the numpy object is on the left. It doesn't matter if it's an int or a float. This error does not get raised with DataFrames.
After more poking around, It looks like this actually comres from a change in numpy, between versions 1.8.2 and 1.9.0.
import pandas as pd
import numpy as np
s = pd.Series(np.arange(4))
arr = np.arange(4)
right = s < arr[0]
left = arr[0] > s
Running this yields
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
bug_lhs_numpy.py in <module>()
6
7 right = s < arr[0]
----> 8 left = arr[0] > s
C:\Anaconda\lib\site-packages\pandas-0.14.1_236_g989a51b-py2.7-win-amd64.egg\pandas\core\ops.py
ther)
555 return NotImplemented
556 elif isinstance(other, (pa.Array, pd.Series, pd.Index)):
--> 557 if len(self) != len(other):
558 raise ValueError('Lengths must match to compare')
559 return self._constructor(na_op(self.values, np.asarray(other)),
TypeError: len() of unsized object
In [2]: type(arr[0])
Out[2]: numpy.int32
Comment From: jreback
left = 0 > s
works (e.g. a python scalar). So I think this is being treated as a 0-dim array (its a np.int64
) (and not as a scalar when called.) I'll mark as a bug. Feel free to dig in.
Comment From: dmsul
It is indeed being converted at some point to a 0-dim array (ipython %debug). But I'm fairly certain it's something on the numpy side. (As long as numpy<=1.8.2, everything is fine in pandas>=0.14.0. As soon as you bump numpy to 1.9.0, the example code raises the error on every version of pandas.) I think this one is going to be way over my head. But if I happen to learn C before someone else takes care of this, I'll give it a try.
Comment From: tvyomkesh
Thinking about this from a slightly different angle. Series comparison seems to works fine if LHS or RHS is one element list.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: [0] < pd.Series(np.arange(4))
Out[3]:
0 False
1 True
2 True
3 True
dtype: bool
In [8]: pd.Series(np.arange(4)) > [0]
Out[8]:
0 False
1 True
2 True
3 True
dtype: bool
Question is should the behavior be modified so that we get the same answer with one element pd.Series or np.ndarray, same as with list or scalar? numpy works with both scalar or one element np.ndarray.
In [9]: np.arange(1) < np.arange(4)
Out[9]: array([False, True, True, True], dtype=bool)
In [10]: 0 < np.arange(4)
Out[10]: array([False, True, True, True], dtype=bool)
Right now these comparisons will throw error. With the proposed fix, these can be expected to work fine.
In [3]: pd.Series(np.arange(1)) < pd.Series(np.arange(4))
In [4]: np.arange(1) < pd.Series(np.arange(4))
In [5]: pd.Series(np.arange(4)) > pd.Series(np.arange(1))
In [6]: pd.Series(np.arange(4)) > np.arange(1)
Comment From: wadawson
I can confirm @dmsul 's diagnosis of this bug. Afraid I don't have time currently to correct this but +1 for a fix.
Comment From: gliptak
The basic form of this is:
import numpy as np
import pandas as pd
s = pd.Series([1])
b = np.int32(1)
b < s
type(b)
before the call is <class 'numpy.int32'>
, while it is <class 'numpy.ndarray'>
within ops.wrapper (other)
.
How can the type of this variable change?
Comment From: jreback
both are correct
np.int32 is a 0 dim scalar that's also an ndarray
Comment From: gliptak
This is what I get:
In [12]: isinstance(np.int32(1), np.ndarray)
Out[12]: False
but somehow it becomes True
within ops.wrapper
...
Comment From: gliptak
Thoughts on how to look into this further? Thanks
Comment From: jreback
@gliptak you need to step thru and debug
Comment From: gliptak
I did that already ... Before the call into the function it is int32 1
, within the function it shows ndarray 1
. Pointers to what translation
could have happened in between
are welcome.
Comment From: jbrockmendel
AFAICT there are two more-or-less separate issues here: Series comparison with numpy scalar (works fine, probably has for a while), and Series comparison with 1-element listlike (not supported).
We recently changed DataFrame broadcasting behavior to match numpy with 2-dimensional arrays with shape either (1, ncols) or (nrows, 1). We could consider doing the same for Series broadcasting against 1-dimensional objects with shape (1,).
@dmsul does this synopsis appear accurate?
Comment From: dmsul
@jbrockmendel No idea, it's been almost 4 years since I looked at this bug.
As of numpy 1.12.1 and pandas 0.20.2 (just what I had in the nearest env at hand) there is no error, and I can't get an error when doing s < [0]
or any number of permutations. Haven't tried it with ops.wrapper
, but from a practitioner's standpoint I can't recreate it.
Comment From: jbrockmendel
@jreback closeable?
Comment From: jreback
yeah i think this was the array_priority which was fixed a long time ago