Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({'a': ['Hello', 'World']})
df['a'] = df['a'].astype('S')
df['b'] = df['a'].astype('S')
print(df.dtypes)
Output is:
a object
b |S5
dtype: object
Problem description
The new column has the correct dtype of numpy byte string, the old column no contains
the new values but still has dtype object
.
Expected Output
Both columns should have dtype 'S'
Output of pd.show_versions()
Comment From: jreback
dupe of #12857
an easy fix actually, these should be converted to object
(never a fixed-width string/bytes which are not supported)
Comment From: maxnoe
So you say this is intended behaviour and not converting to object on the new assignment is the bug?
Comment From: jreback
yes, this should be object
dtype. see the duplicate issue.
Comment From: maxnoe
I think you should leave it to the user. If the user wants to do astype('S')
that's pretty specific. Why force using object?
Comment From: jreback
fixed width types for strings are not supported and don't offer any advantages except more complexity - this might change at some point but this is how it is