Pandas Segfault with array view and "large" dtypes

Code Sample

import numpy as np
import pandas as p
import pathlib as pl
import struct as st

my_type = np.dtype([('1', '<u4'),
                 ('3',
                  [('4',
                    [('5', '<u4', (1000,))])])])

def convertArray(arr):
    return arr[4004:].view(my_type)

def loadArray(path):
    f = open(path, mode='rb')
    array = np.fromfile(f, dtype='uint8')
    my_data = convertArray(array)
    return my_data

def arrayToSeries(arr):
    part_of_arr = arr['3']
    return p.DataFrame(part_of_arr)

path = pl.Path('test_file')

frame = arrayToSeries(loadArray(path))

print(frame.head())

Problem description

The python interpreter segfaults.

How to generate the test_file (on linux):

dd if=/dev/zero of=test_file bs=4004 count=1001

Expected Output

The head of the DataFrame created from the file. Or at least not a segfault, but a useful error message.

[paste the output of ``pd.show_versions()`` here below this line] INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.13.12-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: de_AT.UTF-8 LOCALE: de_AT.UTF-8 pandas: 0.21.0 pytest: None pip: 9.0.1 setuptools: 28.8.0 Cython: 0.27.3 numpy: 1.13.3 scipy: None pyarrow: None xarray: None IPython: 6.2.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.1.0 openpyxl: 2.4.9 xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0b10 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Edit: I now have tested with the newest master (54f2a5e91e90e35f7cbd15214297169831d6a6a6). Same result.

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.13.12-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: de_AT.UTF-8 LOCALE: de_AT.UTF-8 pandas: 0.22.0.dev0+140.g54f2a5e91 pytest: None pip: 9.0.1 setuptools: 28.8.0 Cython: 0.27.3 numpy: 1.13.3 scipy: None pyarrow: None xarray: None IPython: 6.2.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.1.0 openpyxl: 2.4.9 xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0b10 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

compound /nested dtypes are not supported in the DataFrame constructor. you can pass a structured array to DataFrame.from_records.

Comment From: jreback

this actually worked for me on macosx. but again, you are trying to store data in a not-first class way.

0  ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
1  ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
2  ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
3  ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
4  ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...

In [2]: frame.dtypes
Out[2]: 
4    object
dtype: object

Comment From: Lazarus535

If i simplify the dtype, then there is an exception of the likes "only 1D arrays allowed" or smth. But in exactly this case i get a segfault, instead of an exception. :-( I tested on all my (Linux) machines...everywhere the same issue.