Code Sample

import pandas as pd


pd.__version__

# df created from list, None is casted to str
df = pd.DataFrame(["1", "2", None], columns=["a"], dtype="str")
type(df.loc[2].values[0])


# Equivalent df created from dict, None remains NoneType
df = pd.DataFrame({"a": ["1", "2", None]}, dtype="str")
type(df.loc[2].values[0]))

Problem description

None is not casted consistently for DataFrames with None values and dtype set to str.

If DataFrame is created from list, then None is casted to str -> None -> "None". If DataFrame is created from dict, then None remains NoneType -> None -> None.

IMO, the latter is the preferred behavior. I hope you consider this when continuing your work on string types and na types in future versions.

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 41.2.0 Cython : None pytest : 5.3.5 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.12.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : 5.3.5 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Comment From: prakhar987

None is converted to 'None' when numpy array (made from passed list ['1', '2', None]) is casted :

https://github.com/pandas-dev/pandas/blob/54b400196c4663def2f51720d52fc3408a65e32a/pandas/core/internals/construction.py#L173

Should numpy handle this or we can do a workaround. Can work on a PR.

Comment From: ali-cetin-4ss

But isn't the "bigger" issue here that the behavior is different for seemingly two equivalent ways of instantiating a Dataframe?

Comment From: prakhar987

I am assuming one is wrong, the case where None becomes "None".

Comment From: cgarciae

I tested against master (895f0b4) and the issue is now gone, I think this can be closed now.

Comment From: jreback

we would take a PR with a validation test

Comment From: cgarciae

take

Comment From: mpark1994

hey I'm looking for first-issues and I'm wondering what needs to be done?

Comment From: harsimran44

Hey , can i do this. can you guide how to do this. please