#### Problem description Overwriting columns of type object with type string fails.
import pandas as pd
data = pd.DataFrame({'a': ['hello', 'there']})
col_as_string = data['a'].astype('|S')
data['a'] = col_as_string
print('Failed change 1:', data['a'].dtype)
data.loc[:, 'a'] = col_as_string
print('Failed change 2:', data['a'].dtype)
del data['a']
data['a'] = col_as_string
print('Successful change:', data['a'].dtype)
Output
Failed change 1: object
Failed change 2: object
Successful change: |S5
Expected output
Failed change 1: |S5
Failed change 2: |S5
Successful change: |S5
This only seems to happen for dtype object -> dtype string, if it's numerical this works for me.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.21.1
pytest: 3.1.2
pip: 9.0.1
setuptools: 38.2.4
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
feather: None
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
Comment From: jreback
pandas does not support fixed length strings these are stored as object dtype
Comment From: dgmp88
@jreback - but the dtype returns 'S5', and data[col].values.dtype
is also 'S5' in the successful case, so it seems it is being stored as a fixed length string, no?
I just read this so now understand why fixed length isn't normally supported, but the fact the code above 'works' in the third case seems confusing.