If you have a series of binary strings containing numbers in a DataFrame, if you try to convert it using convert_objects, the first will show up as NaN.

In [3]: df = pd.DataFrame([b'1',b'1'])

In [4]: df.convert_objects(convert_numeric=True)
Out[4]: 
    0
0 NaN
1   1

Python 3, obviously; happens on Windows and Linux.

Comment From: jreback

these are not binary strings, but bytes.

convert_objects operates on strings. I suppose this could work, but it IS a different type of operation. These don't actually represent 'values' (though they could). Its up to the user in how to interpret these.

I think this should actually just return them unchanged (e.g. try NOT to interpret them).

@jorisvandenbossche @jtratner

In [8]: DataFrame([u'1',u'0']).convert_objects(convert_numeric=True)       
Out[8]: 
   0
0  1
1  0

Comment From: arijun

Hmm, the name convert_objects is a little misleading then--a series containing byte strings will still say it is of type object.

I'm not sure why you say it's a different type of operation? My use case is I'm getting numbers using paramiko, which returns everything as bytes. I still want to convert those numbers as I would a str.

Consider also that float(b'1.2') will still do the conversion.

Comment From: jreback

if you read the name it only converts strings, that said you can do this:

bytes have to be decoded to strings before they are usable. If you are using them directly you have to be quite careful and have a specific use case in mind, these are NOT strings.

In [13]: df.apply(lambda x: x.str.decode('utf-8')).convert_objects(convert_numeric=True)
Out[13]: 
   0
0  1
1  1

Comment From: arijun

Yes that's what I did in the end.

if you read the name it only converts strings

I'm afraid you've lost me--where does it say this? It might be worth it to mention somewhere in the docstring that it will not work on bytes, unless I am missing some obvious other mention.

I think this should actually just return them unchanged (e.g. try NOT to interpret them).

That's fine, but be aware that will be a different behavior than the built in float and np.float64, both of which do convert bytes.

Comment From: jreback

@arijun

I suppose we could try to convert them (the reason it doesn't work as this has never come up before and thus is not tested!).

The problem is we would need to decode them (as you see above). and the decoding is not completely obvious (we could try with 'utf-8' but this really should be a user action).

So I think just raise for now.

Want to do a pull-request? (this would be in some internals FYI, so have to figure out best place to put it).

Comment From: mroeschke

convert_objects has been removed. Closing