Pandas Regression : Pandas 0.20.1 vs 0.19.1

Code Sample, a copy-pastable example if possible

#Using pandas 0.19.2
import pandas as pd, pickle
arr= np.ones((5,5))
col= [  r'統合商C', r'統合商名', r'区分', r'H区分名' ,  r'H区'  ]
df= pd.DataFrame(arr, columns=col)

with open( 'df_pandas_192.pkl', 'wb') as f:
        pickle.dump(df, f, pickle.HIGHEST_PROTOCOL)

# Using pandas 0.20.1
import pickle
with open('df_pandas_192.pkl', 'rb') as f:
    df= pickle.load(f)

Problem description

We have an issue due to module breaks (probably dataframe class has been changed in 0.20.1)

/home/ubuntu/project27//aapackage/util_min.py in py_load_obj(folder, isabsolutpath, encoding1)
    485        dir1= folder
    486 
--> 487     with open(dir1, 'rb') as f:
    488         return pickle.load(f)
    489 
    ModuleNotFoundError: No module named 'pandas.indexes'

Expected Output

Pickle serialized with this version `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en_US LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 8.1.2 setuptools: 23.0.0 Cython: 0.24 numpy: 1.11.1 scipy: 0.18.0 statsmodels: 0.8.0 xarray: None IPython: 4.2.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2017.2 blosc: None bottleneck: 1.1.0 tables: 3.4.2 numexpr: 2.6.0 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: 0.10.3 apiclient: 1.6.2 sqlalchemy: 1.0.13 pymysql: 0.7.9.None psycopg2: 2.6.2 (dt dec pq3 ext lo64) jinja2: 2.8 boto: 2.40.0 pandas_datareader: None

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-1031-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: gfyoung

@arita37 : Thanks for reporting this issue! Could you do three things for us quickly:

1) Can you try upgrading your pandas installation to 0.20.3 and see if that changes anything? 2) What is the error that you are getting from reading this file? 3) What is the file that you are reading (we can't run your code otherwise)

Comment From: arita37

1) Have the same 2) ModuleNotFoundError: No module named 'pandas.indexes' 3) any dataframe pickle in 0.19.2 has the same issues.

Using pickle for regression test, it's a good way to catch class definition breaks.... Do you publish regression test results for pandas ?

Comment From: gfyoung

@arita37 : We have a whole suite of pickling tests in our test suite, which you can find here:

https://github.com/pandas-dev/pandas/blob/a4c4edeb2a7e5c84b5a82a9743a12a4b66e7bcf1/pandas/tests/io/test_pickle.py

Comment From: jorisvandenbossche

@arita37 Can you try to read the pickle file with pd.read_pickle ? That should normally work fine.

The 'official' stand is that only to_pickle / read_pickle are ensured to be able to write/read pickle files correctly across pandas versions.

Comment From: jorisvandenbossche

Some duplicate issues: https://github.com/pandas-dev/pandas/issues/16564, https://github.com/pandas-dev/pandas/issues/16474, https://github.com/pandas-dev/pandas/issues/16278

It is mentioned in the docs here: http://pandas.pydata.org/pandas-docs/stable/io.html#pickling

Comment From: jreback

the big red warning in the docs: http://pandas.pydata.org/pandas-docs/stable/io.html#pickling