Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from io import StringIO
import pandas as pd
input = StringIO(
""",ID,Team,Puzzler
0,ID_1,10,1
1,ID_2,11,0
2,ID_3,10,0
3,ID_4,11,1""")
df = pd.read_csv(input, dtype_backend='pyarrow', index_col=0)
df = df.astype({'Team': 'category'})
df.to_parquet('example.parquet')
Issue Description
I get the error message
pyarrow.lib.ArrowInvalid: ('Could not convert <pyarrow.Int64Scalar: 10> with type pyarrow.lib.Int64Scalar: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column Team with type category')
when trying to call .to_parquet
on the dataframe. I only get the error when setting the type with .astype()
function. If i instead load the dataframe as df = pd.read_csv(input, dtype_backend='pyarrow', index_col=0, dtype={'Team': 'category')
it works fine.
Expected Behavior
To create a parquet file named 'example.parquet' without error.
Installed Versions
Comment From: mroeschke
Here's a simpler reproducer; this looks to be an issue with pyarrow not being able to convert Categoricals with ArrowDtype
In [9]: pa.Table.from_pandas(pd.DataFrame(pd.Categorical(np.array([1], dtype=object))))
Out[9]:
pyarrow.Table
0: dictionary<values=int64, indices=int8, ordered=0>
----
0: [ -- dictionary:
[1] -- indices:
[0]]
In [10]: pa.Table.from_pandas(pd.DataFrame(pd.Categorical(pd.Series([1], dtype="int64[pyarrow]"))))
ArrowInvalid: ('Could not convert <pyarrow.Int64Scalar: 1> with type pyarrow.lib.Int64Scalar: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column 0 with type category')
Comment From: mroeschke
I think this will need to be fully addressed in pyarrow so closing. Feel free to report this in Arrow
Comment From: phofl
This should be fixed on the next arrow release, we already have an issue about this