Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

reference/source for this code example: https://github.com/pandas-dev/pandas/issues/37577

import pandas as pd
from dataclasses import dataclass

@dataclass
class Record:
    id: int
    name: str
    constant: float

df = pd.DataFrame([
    Record(0, 'Landau', 3.1415926),
    Record(1, 'Kapitsa', 2.718281828459045),
    Record(2, 'Bogolyubov', 6.62607015),
])

print(df.dtypes)

Issue Description

I would expect that the dtype from the dataclass is inherited / used in pandas as well.

id int64 name string constant float64 dtype: object

Expected Behavior

id int64 name string constant float64 dtype: object

Installed Versions

INSTALLED VERSIONS ------------------ commit : 8dab54d6573f7186ff0c3b6364d5e4dd635ff3e7 python : 3.10.6.final.0 python-bits : 64 OS : Darwin OS-release : 22.2.0 Version : Darwin Kernel Version 22.2.0: Fri Nov 11 02:03:51 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8 pandas : 1.5.2 numpy : 1.23.5 pytz : 2022.6 dateutil : 2.8.2 setuptools : 65.3.0 pip : 22.3.1 Cython : None pytest : 7.2.0 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.7.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : 2022.11.0 gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 10.0.1 pyreadstat : None pyxlsb : None s3fs : 0.4.2 scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: phofl

Hi, thanks for your report. We don't use those dtypes, same with NamedTuple, but even if we would, str maps to object, the panda string dtype can be set with „string“ not str

Edit: Sorry initial response was only half-correct

Comment From: butterbach

I actually thought so and but don't get why or is it basically a feature request because Pandas just supports string since 1.0?

For other complex datatypes I might get that behaviour - but getting out of pandas again causes issues like ending up getting a void type.

Comment From: butterbach

@phofl: Sorry to highlight you personally, but I can see that str maps to object in pandas but I can just assume that this was implemented before pandas even had the dtype "string" which is exactly the same as str in Python, isn't it?

What is the reason not to change the mapping? I just want to understand the reason. I get it that object is the default for all other datatypes that are not supported.

When I read the first paragraph of the pandas documentation about the dtype "string", I would assume it should be the opposite:

Pandas BUG: Dataclass Str -> Pandas Object instead of StringDtype source: https://pandas.pydata.org/docs/user_guide/text.html

And specifying str in a dataclass instead of Any (?) is close to the pandas recommendations to actively set dtype=string (series examples in the pandas docs). In this case it should be taken into account by the DataFrame generator.