Feature Type

  • [x] Adding new functionality to pandas

  • [x] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I wish pd.json_normalize accepted JSON (as str or bytes), and not just dict.

Or, as a joke, there could be a pd.dict_normalize that only accepts JSON ;)

Feature Description

Given a Series with JSON as str or bytes:

>>> df["data"]
0                  {"value":0.0}
1          {"value":0.005787037}
2         {"value":0.0115740741}
3         {"value":0.0173611111}

It should be possible to parse the JSON with pd.json_normalize, e.g.

>>> pd.json_normalize(df["data"])
            value
0        0.000000
1        0.005787
2        0.011574
3        0.017361

Pandas already has good JSON integration, so don't see why it can't be done.

Alternative Solutions

From what I understand, right now it must be first parsed with some other library, e.g. with apply, before using pd.json_normalize.

>>> import json
>>> pd.json_normalize(df["data"].apply(json.loads))
            value
0        0.000000
1        0.005787
2        0.011574
3        0.017361

Additional Context

With better JSON/JSONB support in databases like postgres and sqlite, encountering this sort of data is becoming more common, and the intermediate apply step is a performance and usability issue:

>>> import json
>>> df = pd.read_sql(sql=query, con=conn)
>>> pd.json_normalize(df["data"].apply(json.loads))
            value
0        0.000000
1        0.005787
2        0.011574
3        0.017361