Feature Type
-
[x] Adding new functionality to pandas
-
[x] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
I wish pd.json_normalize
accepted JSON (as str or bytes), and not just dict
.
Or, as a joke, there could be a pd.dict_normalize
that only accepts JSON ;)
Feature Description
Given a Series with JSON as str or bytes:
>>> df["data"]
0 {"value":0.0}
1 {"value":0.005787037}
2 {"value":0.0115740741}
3 {"value":0.0173611111}
It should be possible to parse the JSON with pd.json_normalize
, e.g.
>>> pd.json_normalize(df["data"])
value
0 0.000000
1 0.005787
2 0.011574
3 0.017361
Pandas already has good JSON integration, so don't see why it can't be done.
Alternative Solutions
From what I understand, right now it must be first parsed with some other library, e.g. with apply
, before using pd.json_normalize
.
>>> import json
>>> pd.json_normalize(df["data"].apply(json.loads))
value
0 0.000000
1 0.005787
2 0.011574
3 0.017361
Additional Context
With better JSON/JSONB support in databases like postgres and sqlite, encountering this sort of data is becoming more common, and the intermediate apply
step is a performance and usability issue:
>>> import json
>>> df = pd.read_sql(sql=query, con=conn)
>>> pd.json_normalize(df["data"].apply(json.loads))
value
0 0.000000
1 0.005787
2 0.011574
3 0.017361