Code sample

We read in a two column, two row file test_data.csv, that looks like the following: x, y 1, 2

import decimal

import pandas as pd

d = pd.read_csv("test_data.csv", dtype={'y': decimal.Decimal})

Problem description

I have been using pandas.read_csv and passing dtype={'some_column': decimal.Decimal} as a keyword argument with pandas version 0.19.2. In pandas version 0.20.x, this no longer works and I receive the following error:

TypeError: dtype <class 'decimal.Decimal'> not understood

I can get around this error by instead reading in with the str data type, e.g. dtype={'some_column': str}. The end result is the same, as I have to manually cast the numeric data to the decimal.Decimal type after reading. However, I think the former approach is better as it is more explicit that the particular column's data should be treated as the decimal.Decimal.

Is this expected behavior? It seems like a regression. Any additional general thoughts on working with decimal.Decimal data in pandas DataFrame (and numpy array) objects are also welcome.

Comment From: ghost

Update

I have come across the converters parameter which may actually be a better fit for what I'm doing here:

d = pd.read_csv("test_data.csv", converters={'y': decimal.Decimal})

This does not really explain the change in the behavior as mentioned in my original post here. However, perhaps this is actually a more appropriate approach. If anyone can confirm what the data is converted from, that would be helpful, e.g. can I assume that the above line of code would first read in the value as a str and then cast it to a decimal.Decimal type? I would like to avoid the situation that the data is e.g. read in as a float automatically and then casted to decimal.Decimal, since that could introduce errors.

Comment From: jreback

decimal is not a first class type in pandas. so no guarantees on anything here and this is as expected.