Hi there,
I have a pandas array that contains lists of tuples as shown below:
APP_chr21 GRIK1_chr21
SC01 [(215, 240), (62, 55)] [(97, 95), (152, 148)]
SC02 [(116, 123), (25, 17)] [(48, 39), (73, 71)]
SC03 [(122, 0), (40, 0)] [(51, 75), (97, 69)]
SC04 [(1, 157), (0, 42)] [(80, 70), (100, 96)]
I seem to be unable to save it to file and than reload it in its original state. Upon
>>> original_array.to_csv('test.csv')
>>> reloaded_array = pd.read_csv('test.csv', index_col=0, dtype=object)
all elements are converted from lists to strings even though I specify dtype=object
.
>>> print original_array
APP_chr21 GRIK1_chr21
SC01 [(215, 240), (62, 55)] [(97, 95), (152, 148)]
SC02 [(116, 123), (25, 17)] [(48, 39), (73, 71)]
SC03 [(122, 0), (40, 0)] [(51, 75), (97, 69)]
SC04 [(1, 157), (0, 42)] [(80, 70), (100, 96)]
>>> for element in original_array['APP_chr21']: print type(element)
<type 'list'>
<type 'list'>
<type 'list'>
<type 'list'>
>>> print reloaded_array
APP_chr21 GRIK1_chr21
SC01 [(215, 240), (62, 55)] [(97, 95), (152, 148)]
SC02 [(116, 123), (25, 17)] [(48, 39), (73, 71)]
SC03 [(122, 0), (40, 0)] [(51, 75), (97, 69)]
SC04 [(1, 157), (0, 42)] [(80, 70), (100, 96)]
>>> for element in reloaded_array['APP_chr21']: print type(element)
<type 'str'>
<type 'str'>
<type 'str'>
<type 'str'>
As a result of this behaviour it is impossible to further process the data.
I'm sure there must be an easy way to recreate the array in its original state from the csv file and was wondering if anyone could help.
Thanks a lot in advance,
mce
Comment From: jreback
CSV is not a high fidelity format. It cannot represent schema, rather it must be inferred.
Further storing list-likes in cells is quite inefficient and non-idiomatic.
You are fighting a losing battle here. You are better off exploding the columns, possibly with a multi-level column index, and/or representing this an nd data.