A small, complete example of the issue
"""Example of Pandas bug."""
from numpy.random import randn
from pandas import DataFrame, read_msgpack
POWER = 6
ROWS = 10 ** POWER
COLS = 280
SIZE = ROWS * COLS
# COL_NAMES = ['Columns']
print('rows: {:,}; cols: {}; data points: {:,}; float memory (bytes): {:,}; '
'float memory (GiB): {:.3}'.format(ROWS, COLS, SIZE, SIZE * 8,
float(SIZE * 8) / 1024 / 1024 / 1024))
DTF = DataFrame(randn(ROWS, COLS)) # , columns=COL_NAMES
print(DTF)
DTF.to_msgpack('foobar_{}.msg'.format(POWER))
print('Reading exercise')
DTF2 = read_msgpack('foobar_{}.msg'.format(POWER))
print(DTF2)
Current Output
rows: 1,000,000; cols: 280; data points: 280,000,000; float memory (bytes): 2,240,000,000; float memory (GiB): 2.09
0 1 2 3 4 5 6 \
0 -0.026045 1.102096 0.702563 -1.508039 0.308464 0.504351 2.887272
7 8 9 ... 270 271 272 273 \
0 -0.700034 -0.11521 -0.447198 ... -0.566173 0.16131 -0.581333 -0.362
274 275 276 277 278 279
0 -1.433336 0.60287 0.799355 0.628572 0.542312 -0.410359
[1 rows x 280 columns]
Reading exercise
Traceback (most recent call last):
File "../bug/pandas_bug6.py", line 20, in <module>
print(DTF2.head(1))
AttributeError: 'list' object has no attribute 'head'
Expected Output
rows: 1,000,000; cols: 280; data points: 280,000,000; float memory (bytes): 2,240,000,000; float memory (GiB): 2.09
0 1 2 3 4 5 6 \
0 -0.026045 1.102096 0.702563 -1.508039 0.308464 0.504351 2.887272
7 8 9 ... 270 271 272 273 \
0 -0.700034 -0.11521 -0.447198 ... -0.566173 0.16131 -0.581333 -0.362
274 275 276 277 278 279
0 -1.433336 0.60287 0.799355 0.628572 0.542312 -0.410359
[1 rows x 280 columns]
Reading exercise
0 1 2 3 4 5 6 \
0 -0.026045 1.102096 0.702563 -1.508039 0.308464 0.504351 2.887272
7 8 9 ... 270 271 272 273 \
0 -0.700034 -0.11521 -0.447198 ... -0.566173 0.16131 -0.581333 -0.362
274 275 276 277 278 279
0 -1.433336 0.60287 0.799355 0.628572 0.542312 -0.410359
[1 rows x 280 columns]
Output of pd.show_versions()
## INSTALLED VERSIONS
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 16.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 18.5
Cython: None
numpy: 1.8.0rc1
scipy: 0.13.0b1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 1.5
pytz: 2013.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
None
Comment From: VelizarVESSELINOV
python3 version is giving to_msgpack
error using the same code:
Traceback (most recent call last):
File "../bug/pandas_bug6.py", line 17, in <module>
DTF.to_msgpack('foobar_{}.msg'.format(POWER))
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/generic.py", line 1122, in to_msgpack
**kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/io/packers.py", line 151, in to_msgpack
writer(fh)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/io/packers.py", line 147, in writer
fh.write(pack(a, **kwargs))
OSError: [Errno 22] Invalid argument
Comment From: chris-b1
Thanks for the report, this is a duplicate of #12905 - the issue is that msgpack
doesn't support a bytestring > 2GB. We should at minimum catch this and show a better error message.
https://github.com/msgpack/msgpack/blob/master/spec.md#formats-ext