Pandas is not able to export a matrix as csv file and reimport it and keeping the data consistent. Using default arguments for to_csv
, it will add an additional column for the indeces. Importing with default parameters will treat this column as data, not as index....
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10,10))
df.to_csv('np.csv')
print pd.read_csv('np.csv')
Unnamed: 0 0 1 2 3 4 5 \
0 0 0.055663 0.492976 0.936424 0.931585 0.043748 0.931660
1 1 0.946510 0.481707 0.935273 0.987895 0.982537 0.735273
2 2 0.429818 0.090192 0.923747 0.973678 0.432166 0.318196
3 3 0.579657 0.599554 0.794318 0.631867 0.700044 0.834421
4 4 0.438074 0.747774 0.034653 0.113885 0.982059 0.736432
5 5 0.379523 0.094214 0.435573 0.729742 0.778312 0.341792
6 6 0.542644 0.175657 0.913459 0.532352 0.607791 0.369434
7 7 0.132935 0.052179 0.145688 0.549158 0.127237 0.475737
8 8 0.454960 0.872086 0.006616 0.444334 0.435469 0.435362
9 9 0.141345 0.512531 0.900547 0.570482 0.366632 0.992289
6 7 8 9
0 0.385482 0.432543 0.927187 0.408233
1 0.385019 0.905481 0.852093 0.368507
2 0.641478 0.966683 0.706884 0.229032
3 0.592390 0.091528 0.969585 0.177480
4 0.805170 0.585675 0.024259 0.961815
5 0.818240 0.688166 0.175099 0.583955
6 0.697869 0.202709 0.458018 0.546078
7 0.597875 0.625422 0.055143 0.720858
8 0.866318 0.348642 0.855215 0.689258
9 0.723096 0.194654 0.681293 0.941478
Comment From: jreback
the inverse operation of .to_csv
is from_csv
In [10]: df = pd.DataFrame(np.random.rand(10,10))
In [11]: df.to_csv('test.csv',mode='w')
In [12]: !cat test.csv
,0,1,2,3,4,5,6,7,8,9
0,0.410789548933,0.141882962291,0.481424012182,0.253145260533,0.349319258408,0.552969720747,0.457827171398,0.361762326267,0.00569519672086,0.623535751613
1,0.369638666467,0.322324774448,0.400265909069,0.642042275107,0.799972540147,0.359167258874,0.239007981282,0.812969158011,0.559582423368,0.00271466592636
2,0.717172031665,0.179713595564,0.956176942931,0.848912709056,0.91118300087,0.391446338563,0.708771850147,0.885832551406,0.708784751692,0.430181079966
3,0.0225329325896,0.190005393361,0.0194796447118,0.869802283448,0.430925947353,0.136011580077,0.529612719739,0.681007234468,0.115292421255,0.305482908184
4,0.289044376003,0.535503444011,0.212408295498,0.0542784302991,0.664277492374,0.357734952961,0.375739315655,0.831491303632,0.00554139533804,0.59147155945
5,0.317218866368,0.461190823521,0.0580049804076,0.539360261154,0.990320435889,0.430079077782,0.442252192586,0.286467160784,0.67520580223,0.358516637142
6,0.681700666131,0.468662142977,0.178406551592,0.627463561773,0.9228852801,0.956406234721,0.669339262005,0.0653954611576,0.187273735622,0.697836946507
7,0.00022882527549,0.00633811057126,0.147099077394,0.0305195112454,0.395283200237,0.163439056245,0.138368552052,0.999240657646,0.786156284675,0.94207117023
8,0.686420735795,0.634091772292,0.448123675745,0.960918481445,0.341246536191,0.349309821001,0.203070985042,0.520821277184,0.0863019780958,0.850411108284
9,0.403063746431,0.0217493935357,0.706866005935,0.19966875768,0.902210895494,0.360288312432,0.422414808927,0.721770768274,0.650247350901,0.436017563996
In [14]: DataFrame.from_csv('test.csv')
Out[14]:
0 1 2 3 4 5 6 7 8 9
0 0.410790 0.141883 0.481424 0.253145 0.349319 0.552970 0.457827 0.361762 0.005695 0.623536
1 0.369639 0.322325 0.400266 0.642042 0.799973 0.359167 0.239008 0.812969 0.559582 0.002715
2 0.717172 0.179714 0.956177 0.848913 0.911183 0.391446 0.708772 0.885833 0.708785 0.430181
3 0.022533 0.190005 0.019480 0.869802 0.430926 0.136012 0.529613 0.681007 0.115292 0.305483
4 0.289044 0.535503 0.212408 0.054278 0.664277 0.357735 0.375739 0.831491 0.005541 0.591472
5 0.317219 0.461191 0.058005 0.539360 0.990320 0.430079 0.442252 0.286467 0.675206 0.358517
6 0.681701 0.468662 0.178407 0.627464 0.922885 0.956406 0.669339 0.065395 0.187274 0.697837
7 0.000229 0.006338 0.147099 0.030520 0.395283 0.163439 0.138369 0.999241 0.786156 0.942071
8 0.686421 0.634092 0.448124 0.960918 0.341247 0.349310 0.203071 0.520821 0.086302 0.850411
9 0.403064 0.021749 0.706866 0.199669 0.902211 0.360288 0.422415 0.721771 0.650247 0.436018
Comment From: groakat
Sorry, I did not see that, because it seems not to be exposed to the pandas library. So you have to do
df = pd.DataFrame.from_csv('text.csv')
rather than
df = pd.from_csv('text.csv')
Comment From: jreback
round tripping csv is not nearly as common as simply reading csvs
you might want to simply
to_csv(..., index=False)
then can be read by default arguments in pd.read_csv
or u can write as is and specify
pd.read_csv(...., index_col=None)
Comment From: johne13
FWIW, this does not hold for strings with a value of 'NA'. I imagine there are other exceptions also. I'm not sure exactly how much consistency is to be expected here to be honest. Probably it can be expected for numbers but not necessarily for less well behaved strings where quoting, non-standard characters, etc could throw things off.
df=pd.DataFrame({ 'x':['NA','foo'] })
x
0 NA
1 foo
df.to_csv('test.csv')
pd.DataFrame.from_csv('test.csv')
x
0 NaN
1 foo
Comment From: jreback
@johne13 I don't that was ever intended and is very much an edge case otherwise you not have automatic nan conversion on float dtypes which is much more common
you cannot have perfect fidelity in all situations with an inherently imperfect format there simply is not enough meta data in csv
HDF5 and msgpack have very nice fidelity OTOH (in theory SQL does as well but in practice has issues)
Comment From: johne13
@jreback Right, my CSV use is always necessity, not choice. I'm not trying to promote CSV use! But it's a common necessity for me unfortunately.
FWIW, I've actually never come across "NA" or "NaN" in numerical columns in the data I work with (economic and/or survey data). It's always blank or .
(period). But I get the point.