I noticed that when using a boolean comparison operator on a column with null values, Pandas does not propogate the null values to the result. For example, if I do the following:
df.foo > 0
the resulting series includes False entries where there were missing values in foo
.
Comment From: changhiskhan
IIRC, this choice was made so that you can use the output in boolean indexing and other operations that depended on having the dtype stay boolean. Maybe we can add comparison methods like Series.gt and add a keyword option to change the NaN treatment here if so desired.
Comment From: fonnesbeck
That would be great. A warning might be worthwhile, if nothing else. I only caught it because there was a missing value right at the top of the data frame.
Comment From: changhiskhan
To make that more general it might be useful to have an option you can set so pandas run in "verbose" mode where various implicit operations (esp. nan filling) will raise a warning.
Comment From: fonnesbeck
So, in general is there no way to incorporate a missing value into an integer series? Do I have to explicitly change its dtype? For example, if I assign None
to an integer series element, it just silently fails to update that element.
Comment From: fonnesbeck
Here is another strange example. I have a boolean series:
>>> hearing_loss_birth.dtype
dtype('bool')
If I try to assign a nan
to one of the elements:
>>> hearing_loss_birth[1] = nan
I get a ValueError: Cannot assign nan to integer series
. This is strange, since I am dealing with a boolean series.
Comment From: wesm
That's not a good error message.
I do have plans to fix the NA representation in the future (to avoid the boolean/integer dtype issues), but it's a non-trivial project.
Comment From: fonnesbeck
Here is the full message in an iPython notebook. Scroll to the last cell.
Comment From: jorisvandenbossche
The new pd.NA
scalar now propagates on comparison operations. So closing this issue.