Pandas Elements of array are converted from object to string type upon writing and reading back csv.

Hi there,

I have a pandas array that contains lists of tuples as shown below:

                   APP_chr21             GRIK1_chr21
SC01  [(215, 240), (62, 55)]  [(97, 95), (152, 148)]
SC02  [(116, 123), (25, 17)]    [(48, 39), (73, 71)]
SC03     [(122, 0), (40, 0)]    [(51, 75), (97, 69)]
SC04     [(1, 157), (0, 42)]   [(80, 70), (100, 96)]

I seem to be unable to save it to file and than reload it in its original state. Upon

>>> original_array.to_csv('test.csv')
>>> reloaded_array = pd.read_csv('test.csv', index_col=0, dtype=object)

all elements are converted from lists to strings even though I specify dtype=object.

>>> print original_array
                   APP_chr21             GRIK1_chr21
SC01  [(215, 240), (62, 55)]  [(97, 95), (152, 148)]
SC02  [(116, 123), (25, 17)]    [(48, 39), (73, 71)]
SC03     [(122, 0), (40, 0)]    [(51, 75), (97, 69)]
SC04     [(1, 157), (0, 42)]   [(80, 70), (100, 96)]
>>> for element in original_array['APP_chr21']: print type(element)
<type 'list'>
<type 'list'>
<type 'list'>
<type 'list'>
>>> print reloaded_array
                   APP_chr21             GRIK1_chr21
SC01  [(215, 240), (62, 55)]  [(97, 95), (152, 148)]
SC02  [(116, 123), (25, 17)]    [(48, 39), (73, 71)]
SC03     [(122, 0), (40, 0)]    [(51, 75), (97, 69)]
SC04     [(1, 157), (0, 42)]   [(80, 70), (100, 96)]
>>> for element in reloaded_array['APP_chr21']: print type(element)
<type 'str'>
<type 'str'>
<type 'str'>
<type 'str'>

As a result of this behaviour it is impossible to further process the data.

I'm sure there must be an easy way to recreate the array in its original state from the csv file and was wondering if anyone could help.

Thanks a lot in advance,

mce

Comment From: jreback

CSV is not a high fidelity format. It cannot represent schema, rather it must be inferred.

Further storing list-likes in cells is quite inefficient and non-idiomatic.

You are fighting a losing battle here. You are better off exploding the columns, possibly with a multi-level column index, and/or representing this an nd data.