Code Sample
import pandas as pd
pd.__version__
# df created from list, None is casted to str
df = pd.DataFrame(["1", "2", None], columns=["a"], dtype="str")
type(df.loc[2].values[0])
# Equivalent df created from dict, None remains NoneType
df = pd.DataFrame({"a": ["1", "2", None]}, dtype="str")
type(df.loc[2].values[0]))
Problem description
None is not casted consistently for DataFrames with None values and dtype set to str.
If DataFrame is created from list, then None is casted to str -> None -> "None". If DataFrame is created from dict, then None remains NoneType -> None -> None.
IMO, the latter is the preferred behavior. I hope you consider this when continuing your work on string types and na types in future versions.
Expected Output
Output of pd.show_versions()
Comment From: prakhar987
None is converted to 'None' when numpy array (made from passed list ['1', '2', None])
is casted :
https://github.com/pandas-dev/pandas/blob/54b400196c4663def2f51720d52fc3408a65e32a/pandas/core/internals/construction.py#L173
Should numpy handle this or we can do a workaround. Can work on a PR.
Comment From: ali-cetin-4ss
But isn't the "bigger" issue here that the behavior is different for seemingly two equivalent ways of instantiating a Dataframe?
Comment From: prakhar987
I am assuming one is wrong, the case where None becomes "None".
Comment From: cgarciae
I tested against master (895f0b4) and the issue is now gone, I think this can be closed now.
Comment From: jreback
we would take a PR with a validation test
Comment From: cgarciae
take
Comment From: mpark1994
hey I'm looking for first-issues and I'm wondering what needs to be done?
Comment From: harsimran44
Hey , can i do this. can you guide how to do this. please