Code Sample, a copy-pastable example if possible
#Using pandas 0.19.2
import pandas as pd, pickle
arr= np.ones((5,5))
col= [ r'統合商C', r'統合商名', r'区分', r'H区分名' , r'H区' ]
df= pd.DataFrame(arr, columns=col)
with open( 'df_pandas_192.pkl', 'wb') as f:
pickle.dump(df, f, pickle.HIGHEST_PROTOCOL)
# Using pandas 0.20.1
import pickle
with open('df_pandas_192.pkl', 'rb') as f:
df= pickle.load(f)
Problem description
We have an issue due to module breaks (probably dataframe class has been changed in 0.20.1)
/home/ubuntu/project27//aapackage/util_min.py in py_load_obj(folder, isabsolutpath, encoding1)
485 dir1= folder
486
--> 487 with open(dir1, 'rb') as f:
488 return pickle.load(f)
489
ModuleNotFoundError: No module named 'pandas.indexes'
Expected Output
Pickle serialized with this version pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en_US LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 8.1.2 setuptools: 23.0.0 Cython: 0.24 numpy: 1.11.1 scipy: 0.18.0 statsmodels: 0.8.0 xarray: None IPython: 4.2.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2017.2 blosc: None bottleneck: 1.1.0 tables: 3.4.2 numexpr: 2.6.0 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: 0.10.3 apiclient: 1.6.2 sqlalchemy: 1.0.13 pymysql: 0.7.9.None psycopg2: 2.6.2 (dt dec pq3 ext lo64) jinja2: 2.8 boto: 2.40.0 pandas_datareader: None
Output of pd.show_versions()
Comment From: gfyoung
@arita37 : Thanks for reporting this issue! Could you do three things for us quickly:
1) Can you try upgrading your pandas
installation to 0.20.3
and see if that changes anything?
2) What is the error that you are getting from reading this file?
3) What is the file that you are reading (we can't run your code otherwise)
Comment From: arita37
1) Have the same 2) ModuleNotFoundError: No module named 'pandas.indexes' 3) any dataframe pickle in 0.19.2 has the same issues.
Using pickle for regression test, it's a good way to catch class definition breaks.... Do you publish regression test results for pandas ?
Comment From: gfyoung
@arita37 : We have a whole suite of pickling tests in our test suite, which you can find here:
https://github.com/pandas-dev/pandas/blob/a4c4edeb2a7e5c84b5a82a9743a12a4b66e7bcf1/pandas/tests/io/test_pickle.py
Comment From: jorisvandenbossche
@arita37 Can you try to read the pickle file with pd.read_pickle
? That should normally work fine.
The 'official' stand is that only to_pickle
/ read_pickle
are ensured to be able to write/read pickle files correctly across pandas versions.
Comment From: jorisvandenbossche
Some duplicate issues: https://github.com/pandas-dev/pandas/issues/16564, https://github.com/pandas-dev/pandas/issues/16474, https://github.com/pandas-dev/pandas/issues/16278
It is mentioned in the docs here: http://pandas.pydata.org/pandas-docs/stable/io.html#pickling
Comment From: jreback
the big red warning in the docs: http://pandas.pydata.org/pandas-docs/stable/io.html#pickling