Possibly related to https://github.com/pydata/pandas/issues/13591. Causes https://github.com/dask/dask/issues/1452
Code Sample, a copy-pastable example if possible
import pandas as pd
pd.msgpack.unpackb(pd.msgpack.packb("a"))
Expected Output
"a"
instead, we get
b"a"
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-31-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 25.1.6
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
Comment From: jreback
duplicate of #13591
you are welcome to add this example there.
Comment From: jreback
In [3]: s = Series(['a','b'])
In [5]: pd.read_msgpack(s.to_msgpack())
Out[5]:
0 a
1 b
dtype: object
In [6]: pd.read_msgpack(s.to_msgpack()).values
Out[6]: array(['a', 'b'], dtype=object)
further msgpack.*
routines are private, AND they do exactly what they say, namely translate object->bytes via roundtrip. So you are simply using them wrong.
Comment From: jreback
In [8]: pd.msgpack.dumps('s')
Out[8]: b'\xa1s'
In [9]: pd.msgpack.loads(pd.msgpack.dumps('s'))
Out[9]: b's'
These are the public functions which dask
uses.
In [1]: pd.msgpack.loads?
Docstring:
unpackb(packed, object_hook=None, list_hook=None, bool use_list=1, encoding=None, unicode_errors='strict', object_pairs_hook=None, ext_hook=ExtType, Py_ssize_t max_str_len=2147483647, Py_ssize_t max_bin_len=2147483647, Py_ssize_t max_array_len=2147483647, Py_ssize_t max_map_len=2147483647, Py_ssize_t max_ext_len=2147483647)
Unpack packed_bytes to object. Returns an unpacked object.
Raises `ValueError` when `packed` contains extra bytes.
See :class:`Unpacker` for options.
Type: builtin_function_or_method
In [2]: pd.msgpack.dumps?
Signature: pd.msgpack.dumps(o, **kwargs)
Docstring:
Pack object `o` and return packed bytes
See :class:`Packer` for options.
File: ~/miniconda/envs/py3.5/lib/python3.5/site-packages/pandas/msgpack/__init__.py
Type: function
Comment From: mrocklin
In the dask.distributed protocol we manage this by using the use_bin_type
and encoding
keyword arguments
In [1]: import pandas as pd
In [2]: pd.msgpack.unpackb(pd.msgpack.packb("a"))
Out[2]: b'a'
In [3]: pd.msgpack.unpackb(pd.msgpack.packb(u"a", use_bin_type=True), encoding='
...: utf8')
Out[3]: 'a'