Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.array([list('test')], dtype='string')
# ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
pd.array([list('test'), list('word')], dtype='string')
# ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
pd.array([list('test'), list('words')], dtype='string')
# <StringArray>
# ["['t', 'e', 's', 't']", "['w', 'o', 'r', 'd', 's']"]
# Length: 2, dtype: string
pd.array([list('test')])
# <NumpyExtensionArray>
# [['t', 'e', 's', 't']]
# Length: 1, dtype: object
Issue Description
I'm trying to transform a list of list of strings into a StringArray
, but the pd.array
method with dtype='string'
doesn't work when the second last level list contains lists of same length, raising an exception (see example). If the lists have different lengths, then the output doesn't raise an error and is correct.
In an older version of pandas (1.5.3), it produced a list of same length, but containing repeated casts of string arrays
<StringArray>
[
["['t' 'e' 's' 't']", "['t' 'e' 's' 't']", "['t' 'e' 's' 't']",
"['t' 'e' 's' 't']"]
]
Shape: (1, 4), dtype: string
Expected Behavior
Same as pd.array([list('test')]) but with StringArray
type.
Installed Versions
Comment From: rhshadrach
Thanks for the report! For expected behavior you wrote:
Same as pd.array([list('test')]) but with
StringArray
type.
The result of pd.array([list('test')])
is an array whose elements are lists. For a StringArray
, the elements must be strings. Therefore you cannot accomplish this.