Code Sample, a copy-pastable example
test_data = [
{"values": [1, 2, 3], "metadata": {"listdata": [1, 2]}}]
df = json_normalize(test_data,
record_path=["values"],
meta=[["metadata", "listdata"]])
print(df)
Problem description
It throws error ValueError: Length of values (6) does not match length of index (3)
Changing listdata field from [1,2]
to {1,2}
helped but I can't control JSON I receive.
Expected Output
0 | metadata.listdata |
---|---|
1 | [1, 2] |
2 | [1, 2] |
3 | [1, 2] |
Output of pd.show_versions()
Comment From: simonjayhawkins
Thanks @sann05 for the report.
a possible fix ...
```diff diff --git a/pandas/io/json/_normalize.py b/pandas/io/json/_normalize.py index e77d60d2d4..d04d321388 100644 --- a/pandas/io/json/_normalize.py +++ b/pandas/io/json/_normalize.py @@ -531,7 +531,16 @@ def _json_normalize( raise ValueError( f"Conflicting metadata name {k}, need distinguishing prefix " ) - result[k] = np.array(v, dtype=object).repeat(lengths) + + values = np.array(v, dtype=object) + + if values.ndim > 1: + # GH#37782 + values = np.empty((len(v),), dtype=object) + for i, v in enumerate(v): + values[i] = v + + result[k] = values.repeat(lengths) return result
```
there maybe a more elegant solution of similar code elsewhere in the codebase that could be reused.
PRs to fix welcome.
Comment From: sanjay9977
I have similar problem .. https://stackoverflow.com/questions/72801399/2d-arrays-repeat-throwing-valueerror-operands-could-not-be-broadcast-together
Any plan to fix this issue?
Comment From: Julian-J-S
I have the same problem as described here. What happened to #47708 ? Is there still something happening? :D
In addition to the problem that lists with zero or more than one value return a ValueError I think it is important to mention that lists with 1 element must not be converted to a scalars in any case, the list must remain! When I get a list from an API, I don't know how many elements are in it. If then lists with 1 element are converted to a scalar this would be very inconsistent and questionable :D
Comment From: felipemaion
Thanks @sann05 for the report.
a possible fix ...
```diff diff --git a/pandas/io/json/_normalize.py b/pandas/io/json/_normalize.py index e77d60d2d4..d04d321388 100644 --- a/pandas/io/json/_normalize.py +++ b/pandas/io/json/_normalize.py @@ -531,7 +531,16 @@ def _json_normalize( raise ValueError( f"Conflicting metadata name {k}, need distinguishing prefix " ) - result[k] = np.array(v, dtype=object).repeat(lengths) + + values = np.array(v, dtype=object) + + if values.ndim > 1: + # GH#37782 + values = np.empty((len(v),), dtype=object) + for i, v in enumerate(v): + values[i] = v + + result[k] = values.repeat(lengths) return result
```
there maybe a more elegant solution of similar code elsewhere in the codebase that could be reused.
PRs to fix welcome.
That's a fix for me. Any plan to push this for the oficial repo? It would really be helpful to deploy without having to hack the code manually, just setting the newest version.