Problem description

I have several msgpack encoded data frames from old versions of pandas. I have seen the warnings about this format. However, it seems there are some tests that ensure that old versions can still be loaded. For the current master that seems to include also pandas version 0.16. This is a file that was created with 0.16, which cannot be loaded in 0.20.3.

This is what happens when trying to load the file:

In [4]: pd.read_msgpack('/home/languitar/data/tobi-dataset-post-processed/1/armcontrol-features-Combined+hash.msg')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-8e36ed25a239> in <module>()
----> 1 pd.read_msgpack('/home/languitar/data/tobi-dataset-post-processed/1/armcontrol-features-Combined+hash.msg')

/home/languitar/miniconda2/envs/monitoring/lib/python2.7/site-packages/pandas/io/packers.pyc in read_msgpack(path_or_buf, encoding, iterator, **kwargs)
    201         if exists:
    202             with open(path_or_buf, 'rb') as fh:
--> 203                 return read(fh)
    204 
    205     # treat as a binary-like

/home/languitar/miniconda2/envs/monitoring/lib/python2.7/site-packages/pandas/io/packers.pyc in read(fh)
    186 
    187     def read(fh):
--> 188         l = list(unpack(fh, encoding=encoding, **kwargs))
    189         if len(l) == 1:
    190             return l[0]

pandas/io/msgpack/_unpacker.pyx in pandas.io.msgpack._unpacker.Unpacker.__next__ (pandas/io/msgpack/_unpacker.cpp:5618)()

pandas/io/msgpack/_unpacker.pyx in pandas.io.msgpack._unpacker.Unpacker._unpack (pandas/io/msgpack/_unpacker.cpp:4602)()

UnicodeDecodeError: 'utf8' codec can't decode byte 0xf8 in position 6: invalid start byte

Expected Output

It should load the data without an exception.

Output of pd.show_versions() for the creating pandas version

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Linux OS-release: 4.14.6-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 pandas: 0.16.2 nose: None Cython: None numpy: 1.10.4 scipy: 0.17.1 statsmodels: 0.6.1 IPython: 5.3.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 bottleneck: None tables: 3.2.2 numexpr: 2.5.2 matplotlib: 1.5.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None

Output of pd.show_versions() for the version trying to load the file

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Linux OS-release: 4.14.9-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.utf8 LOCALE: None.None pandas: 0.20.3 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.4.0 Cython: None numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 5.3.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#changes-to-msgpack. this is a use-at-your-own risk format. so would gladly take a patch if you can see what the problem is, but there are no guarantees on this.

Comment From: languitar

The table there says that files packed with pre-0.17 / Python 2 should be readable by any version. This doesn't seem to be true now in this case?

Comment From: jreback

as i said you are on your own here - there was never any guarantees on this format

Comment From: languitar

Sure, but the table in the new documentation somehow implies that this version should work. Maybe the warning could be stressed that the table is only a rough guess.