Code Sample, a copy-pastable example if possible
I came across a weird behavior in the following code
>>> import numpy as np
>>> import pandas as pd
>>> type(pd.Index(['a', 'b' , 'c'])) == np.dtype('float64')
True
The example can be simplified as such
>>> pd.Index == np.dtype('float64')
True
Problem description
The index has a dtype of object, but it returns that it is equal to a np.dtype('float64'). It does not do this for any other dtype.
>>> x = pd.Index(['a', 'b' , 'c'])
>>> print(repr(x.dtype))
dtype('O')
>>> print(type(x))
<class 'pandas.indexes.base.Index'>
>>> print(repr(np.dtype('float64')))
dtype('float64')
And this is not true for any other dtype or basic numpy type
>>> pd.Index == np.dtype('float128')
False
>>> pd.Index == np.dtype('int32')
False
>>> pd.Index == np.dtype('float32')
False
>>> # This is also false, it is only equal to a numpy dtype(), not a basic numpy type
>>> pd.Index == np.float64
False
The type of a pd.Index object returns that it is equal to a np.dtype('float64'). This seems like a side effect of how the pd.Index object is defined.
Perhaps this is a bug that should be fixed?
Expected Output
Comment From: jreback
this has nothing to do with the actual dtype itself
In [5]: pd.Series == np.dtype('float64')
...:
Out[5]: False
In [6]: pd.Index == np.dtype('float64')
...:
Out[6]: True
We assign __eq__
and friends to various sub-classes of Index
to provide proper comparators. I guess np.dtype
is slipping thru here. Though to be honest I am not sure why you are doing this.
Comment From: jreback
the issue is here
I think it needs an else clause after
if isinstance(other, (np.ndarray, Index, ABCSeries)):
if other.ndim > 0 and len(self) != len(other):
raise ValueError('Lengths must match to compare')
Comment From: dsm054
I don't think that's right, because that comparison function should only be used on an instance of pd.Index, I think. I think the real issue is on the numpy side, because
In [37]: np.dtype("float32") == None
Out[37]: False
In [38]: np.dtype("float64") == None
Out[38]: True
and the __eq__
for dtype objects seems to look at the .dtype
attribute if it exists:
In [39]: print(pd.Index.dtype)
None
Comment From: Erotemic
@jreback, The reason why I ran across this has to do with an old function I wrote to determine if a variable is float-like, meaning it could either be a float or numpy array of floats. At the time I didn't know dtypes has string identifiers, and if I rewrote the code today I would use isinstance and those identifiers. However, the function works well as a utility, even if it doesn't follow best practices, so I continue using it.
Comment From: chris-b1
Yes - this is a numpy issue (https://github.com/numpy/numpy/issues/2190#issuecomment-35576788) a symptom of there being an implicit default dtype (e.g.)
In [62]: np.dtype(None)
Out[62]: dtype('float64')