Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the [main branch] (https://pandas.pydata.org/docs/dev/getting_started/install.html#installing-the-development-version-of-pandas) of pandas.
Reproducible Example
import pandas as pd
import json
def json_conversion(df, orient_type = "values"):
# convert dataframe to a JSON string
json_str = df.to_json(orient=orient_type)
# write the JSON string to a file
with open('data.json', 'w') as f:
json.dump(json_str, f)
# read the JSON string from the file
with open('data.json', 'r') as f:
json_str = json.load(f)
# Convert the JSON string back to a dataframe
df2 = pd.read_json(json_str, orient=orient_type)
return df2
# Create a dataframe with imaginary numbers
df = pd.DataFrame({'a': [1 + 2j, 3 + 4j], 'b': [5 + 6j, 7 + 8j]})
print(df)
# a b
# 0 1.0+2.0j 5.0+6.0j
# 1 3.0+4.0j 7.0+8.0j
# Check with `values`
df_values_json = json_conversion(df, "values")
print(df_values_json)
# 0 1
# 0 {'imag': 2.0, 'real': 1.0} {'imag': 6.0, 'real': 5.0}
# 1 {'imag': 4.0, 'real': 3.0} {'imag': 8.0, 'real': 7.0}
# Check with `table`
df_table_json = json_conversion(df, "table")
# TypeError: float() argument must be a string or a number, not 'dict'
Issue Description
When trying to re-create a dataframe with complex numbers using JSON, the pd.read_json()
function has trouble with different orientations, e.g. orient="values"
and orient="table"
. In particular, the reconstructed data frame either treats the number as a combined dictionary with "imag"
and "real"
entries or is unable to be recreated due to a TypeError
.
a | b | |
---|---|---|
0 | 1+2j | 5+6j |
1 | 3+4j | 7+8j |
JSON Output under `orient='values'`
[
[
{
"imag":2.0,
"real":1.0
},
{
"imag":6.0,
"real":5.0
}
],
[
{
"imag":4.0,
"real":3.0
},
{
"imag":8.0,
"real":7.0
}
]
]
This leads to the reconstructed data frame looking like so:
0 | 1 | |
---|---|---|
0 | {'imag': 2.0, 'real': 1.0} | {'imag': 6.0, 'real': 5.0} |
1 | {'imag': 4.0, 'real': 3.0} | {'imag': 8.0, 'real': 7.0} |
In the case of orient='table'
, we have:
JSON Output under `orient='table'`
{
"schema":{
"fields":[
{
"name":"index",
"type":"integer"
},
{
"name":"a",
"type":"number"
},
{
"name":"b",
"type":"number"
}
],
"primaryKey":[
"index"
],
"pandas_version":"0.20.0"
},
"data":[
{
"index":0,
"a":{
"imag":2.0
},
"b":{
"imag":6.0
}
},
{
"index":1,
"a":{
"imag":4.0
},
"b":{
"imag":8.0
}
}
]
}
The end output is a TypeError
of:
TypeError: float() argument must be a string or a number, not 'dict'
Expected Behavior
Ideally, the original data frame should be constructed up to column names in the values
case whereas the table
case should be identical to the original data frame.
Installed Versions
Comment From: dicristina
The documentation says that the output of df.to_json(orient="values")
will be "just the values array" so it is not possible to recover the column names. To recover the actual complex values you can do something like:
df_values_json.applymap(lambda c: complex(**c))
Unlike in the previous case in the orient="table"
case we have the data type of each column so in theory we should be able to reconstruct the values without doing any work at all. The problem here is that there is no special handling for complex numbers and when the values are read they are passed to the float
function. The relevant code is in pandas/io/json/_table_schema.py
.
Comment From: topper-123
Table Schema doesn't seem to have schema fields for complex numbers, so this isn't possible to fix for Pandas, under the constraint that we follow Table Schema. I'm not an expert on Table Schema at all, so if I'm wrong there, I appreciate feedback on that, of course.
So, I agree that the solution proposed by @dicristina using apply
/applymap
is the best possible right now and I don't think this is fixable, while following Table Schema.
Comment From: dicristina
There is a mechanism already in place to add an extDtype
key to the field descriptor for extension types. When the table is read the data type indicated by this key is the one used instead of the one derived from the type
key. Maybe this can be used for complex numbers even though they are not a pandas extension type.
Even when the correct data type is contained in the field descriptor the representation of the complex numbers presents a small problem. The parse_table_schema
function builds a mapping of dtypes and then calls df.astype(dtypes)
. This does not work when we have the complex numbers represented as a dictionary.
Comment From: topper-123
Yes I agree.
Looking at the table schema number definition, it doesn't look like the dict is a legal value for a "number" field, so the current behavior is a bit strange.
Maybe complex numbers should have type "object" instead (i.e. allowing the dict) and a extDtype
field with value "complex". I.e. type "object" will by default be read in as a json-like object (i.e. result from json.loads
in python), except if the field has a "extDtype" with value of "complex", it will be converted to a complex type using complex(**val)
?