Code Sample
import pandas as pd
import numpy as np
print pd.isnull(['abacaba', np.nan]) # [False, False]
print pd.isnull(pd.Series(['abacaba', np.nan])).values # [False, True]
Problem description
Inconsistent behavior.
The call pd.isnull(pd.Series(['abacaba', np.nan])).values
returns [False, True]
,
but pd.isnull(['abacaba', np.nan])
returns [False, False]
, which is counter-intuitive.
np.nan
must be detected as null in both cases.
Expected Output
[False, True]
[False, True]
Output of pd.show_versions()
INSTALLED VERSIONS
-------------------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-96-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 35.0.1
Cython: 0.24
numpy: 1.13.3
scipy: 0.18.1
xarray: None
IPython: 4.2.0
sphinx: 1.2.2
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5
feather: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.1
pandas_gbq: None
pandas_datareader: None
Comment From: chris-b1
I think this may ultimately be a numpy issue - it is casting the nan
to a string
In [12]: np.array(['abacaba', np.nan])
Out[12]:
array(['abacaba', 'nan'],
dtype='<U7')
Comment From: jreback
see duplicate issue here: https://github.com/pandas-dev/pandas/issues/16765
In [4]: pd.isnull(np.array(['abacaba', np.nan], dtype=object))
Out[4]: array([False, True], dtype=bool)
we have no choice but to coerce list-likes that are not Index/Series to a numpy array.