Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

pd.array([list('test')], dtype='string')
# ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

pd.array([list('test'), list('word')], dtype='string')
# ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

pd.array([list('test'), list('words')], dtype='string')
# <StringArray>
# ["['t', 'e', 's', 't']", "['w', 'o', 'r', 'd', 's']"]
# Length: 2, dtype: string

pd.array([list('test')])
# <NumpyExtensionArray>
# [['t', 'e', 's', 't']]
# Length: 1, dtype: object

Issue Description

I'm trying to transform a list of list of strings into a StringArray, but the pd.array method with dtype='string' doesn't work when the second last level list contains lists of same length, raising an exception (see example). If the lists have different lengths, then the output doesn't raise an error and is correct.

In an older version of pandas (1.5.3), it produced a list of same length, but containing repeated casts of string arrays

<StringArray>
[
["['t' 'e' 's' 't']", "['t' 'e' 's' 't']", "['t' 'e' 's' 't']",
 "['t' 'e' 's' 't']"]
]
Shape: (1, 4), dtype: string

Expected Behavior

Same as pd.array([list('test')]) but with StringArray type.

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.12.3.final.0 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 154 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : en LOCALE : English_Europe.1252 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 75.1.0 pip : 24.2 Cython : None pytest : 7.4.4 hypothesis : None sphinx : 7.3.7 blosc : None feather : None xlsxwriter : None lxml.etree : 5.2.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.4 IPython : 8.27.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : 1.3.7 dataframe-api-compat : None fastparquet : None fsspec : 2024.6.1 gcsfs : None matplotlib : 3.9.2 numba : 0.60.0 numexpr : 2.8.7 odfpy : None openpyxl : 3.1.5 pandas_gbq : None pyarrow : 16.1.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : 2024.6.1 scipy : 1.13.1 sqlalchemy : 2.0.34 tables : 3.10.1 tabulate : 0.9.0 xarray : 2023.6.0 xlrd : None zstandard : 0.23.0 tzdata : 2023.3 qtpy : 2.4.1 pyqt5 : None

Comment From: rhshadrach

Thanks for the report! For expected behavior you wrote:

Same as pd.array([list('test')]) but with StringArray type.

The result of pd.array([list('test')]) is an array whose elements are lists. For a StringArray, the elements must be strings. Therefore you cannot accomplish this.