Pandas BUG: json_normalize cannot parse metadata fields list type

Code Sample, a copy-pastable example

test_data = [
    {"values": [1, 2, 3], "metadata": {"listdata": [1, 2]}}]

df = json_normalize(test_data,
                    record_path=["values"],
                    meta=[["metadata", "listdata"]])
print(df)

Problem description

It throws error ValueError: Length of values (6) does not match length of index (3)

Changing listdata field from [1,2] to {1,2} helped but I can't control JSON I receive.

Expected Output

0	metadata.listdata
1	[1, 2]
2	[1, 2]
3	[1, 2]

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3 python : 3.6.12.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-52-generic Version : #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.1.4 numpy : 1.19.4 pytz : 2020.4 dateutil : 2.8.1

Comment From: simonjayhawkins

Thanks @sann05 for the report.

a possible fix ...

```diff diff --git a/pandas/io/json/_normalize.py b/pandas/io/json/_normalize.py index e77d60d2d4..d04d321388 100644 --- a/pandas/io/json/_normalize.py +++ b/pandas/io/json/_normalize.py @@ -531,7 +531,16 @@ def _json_normalize( raise ValueError( f"Conflicting metadata name {k}, need distinguishing prefix " ) - result[k] = np.array(v, dtype=object).repeat(lengths) + + values = np.array(v, dtype=object) + + if values.ndim > 1: + # GH#37782 + values = np.empty((len(v),), dtype=object) + for i, v in enumerate(v): + values[i] = v + + result[k] = values.repeat(lengths) return result

```

there maybe a more elegant solution of similar code elsewhere in the codebase that could be reused.

PRs to fix welcome.

Comment From: sanjay9977

I have similar problem .. https://stackoverflow.com/questions/72801399/2d-arrays-repeat-throwing-valueerror-operands-could-not-be-broadcast-together

Any plan to fix this issue?

Comment From: Julian-J-S

I have the same problem as described here. What happened to #47708 ? Is there still something happening? :D

In addition to the problem that lists with zero or more than one value return a ValueError I think it is important to mention that lists with 1 element must not be converted to a scalars in any case, the list must remain! When I get a list from an API, I don't know how many elements are in it. If then lists with 1 element are converted to a scalar this would be very inconsistent and questionable :D

Comment From: felipemaion

Thanks @sann05 for the report.

a possible fix ...

```diff diff --git a/pandas/io/json/_normalize.py b/pandas/io/json/_normalize.py index e77d60d2d4..d04d321388 100644 --- a/pandas/io/json/_normalize.py +++ b/pandas/io/json/_normalize.py @@ -531,7 +531,16 @@ def _json_normalize( raise ValueError( f"Conflicting metadata name {k}, need distinguishing prefix " ) - result[k] = np.array(v, dtype=object).repeat(lengths) + + values = np.array(v, dtype=object) + + if values.ndim > 1: + # GH#37782 + values = np.empty((len(v),), dtype=object) + for i, v in enumerate(v): + values[i] = v + + result[k] = values.repeat(lengths) return result

```

there maybe a more elegant solution of similar code elsewhere in the codebase that could be reused.

PRs to fix welcome.

That's a fix for me. Any plan to push this for the oficial repo? It would really be helpful to deploy without having to hack the code manually, just setting the newest version.

Pandas BUG: json_normalize cannot parse metadata fields list type

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`