Code sample
We read in a two column, two row file test_data.csv, that looks like the following: x, y 1, 2
import decimal
import pandas as pd
d = pd.read_csv("test_data.csv", dtype={'y': decimal.Decimal})
Problem description
I have been using pandas.read_csv and passing dtype={'some_column': decimal.Decimal}
as a keyword argument with pandas version 0.19.2. In pandas version 0.20.x, this no longer works and I receive the following error:
TypeError: dtype <class 'decimal.Decimal'> not understood
I can get around this error by instead reading in with the str data type, e.g. dtype={'some_column': str}
. The end result is the same, as I have to manually cast the numeric data to the decimal.Decimal type after reading. However, I think the former approach is better as it is more explicit that the particular column's data should be treated as the decimal.Decimal.
Is this expected behavior? It seems like a regression. Any additional general thoughts on working with decimal.Decimal data in pandas DataFrame (and numpy array) objects are also welcome.
Comment From: ghost
Update
I have come across the converters
parameter which may actually be a better fit for what I'm doing here:
d = pd.read_csv("test_data.csv", converters={'y': decimal.Decimal})
This does not really explain the change in the behavior as mentioned in my original post here. However, perhaps this is actually a more appropriate approach. If anyone can confirm what the data is converted from, that would be helpful, e.g. can I assume that the above line of code would first read in the value as a str
and then cast it to a decimal.Decimal
type? I would like to avoid the situation that the data is e.g. read in as a float
automatically and then casted to decimal.Decimal
, since that could introduce errors.
Comment From: jreback
decimal
is not a first class type in pandas. so no guarantees on anything here and this is as expected.