Problem description
Working on issue https://github.com/twosigma/beakerx/issues/6020 I find strange behavior. It occurs while read CVS files and when DataFrame is create manually (as in example) When DF contains uint, dtypes return proper types. When iterate by iterrows(), tuple contain float type even in column where values are under uint. When iterate by index, value have proper type and formating. pandas version: 0.20.3
Code Sample
import pandas
pd = pandas.DataFrame({'a': [1, 2], 'b': [18446744073709551615, 18]})
print ("type a: ", pd.dtypes['a'].name)
print ("type b: ", pd.dtypes['b'].name)
returns the expected types
type a: int64
type b: uint64
Iterate by row,
for item in pd.iterrows():
print ("item: ", item)
for item in pd.iterrows():
for columnName in pd.columns.tolist():
print (columnName, " value: ", item[1][columnName], " type: ", type(item[1][columnName]))
return float64 instead types from columns.
a value: 1.0 type: <class 'numpy.float64'>
b value: 1.84467440737e+19 type: <class 'numpy.float64'>
a value: 2.0 type: <class 'numpy.float64'>
b value: 18.0 type: <class 'numpy.float64'>
This code return expected types.
print("Iterate by index")
for ix in range(len(pd)):
for columnName in pd.columns.tolist():
value = pd[columnName][ix]
print(columnName, " value: ", value, " type: ", type(value))
a value: 1 type: <class 'numpy.int64'>
b value: 18446744073709551615 type: <class 'numpy.uint64'>
a value: 2 type: <class 'numpy.int64'>
b value: 18 type: <class 'numpy.uint64'>
Comment From: jreback
itertating by rows forces conversion to a singular for that row. generally you can convert uint64 to float64 (as you can int64), so this is as expected. If you don't want explicit conversion, then you can .astype(object)
a-priori, of course this introduces a large performance penalty (but you are getting that anyways by iterating).
In [18]: df.iloc[0]
Out[18]:
a 1.000000e+00
b 1.844674e+19
Name: 0, dtype: float64
In [19]: df.iloc[1]
Out[19]:
a 2.0
b 18.0
Name: 1, dtype: float64
In [20]: df.astype(object).iloc[0]
Out[20]:
a 1
b 18446744073709551615
Name: 0, dtype: object
In [21]: df.astype(object).iloc[1]
Out[21]:
a 2
b 18
Name: 1, dtype: object