Is your feature request related to a problem?
convert_dtypes need support convert string with nan to int
Describe the solution you'd like
Now, pandas doesn't consider string with nan case as numerical variable .
pd.Series(['1', '2', '3', np.nan]).convert_dtypes().dtype
got
string[python]
Describe alternatives you've considered
pd.Series(['1', '2', '3', np.nan]).convert_dtypes().dtype
should like pd.Series(['1', '2', '3', np.nan]).astype('Int16').dtype
Int16Dtype()
Comment From: isaacangyu
take
Comment From: isaacangyu
This my first comment on an issue, but after some researching on convert_dtypes(), I got the impression that convert_dtypes() was only meant to convert object data types(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.convert_dtypes.html). Since your list ['1', '2', '3', np.nan] contained strings, not objects, convert_dtypes() could not perform the desired function. However, you may have been making a suggestion for a new feature, in which case I support. It might help if you clarify. I hope my comment was helpful!
Comment From: eromoe
Yeah, this is a feature request.
Comment From: isaacangyu
Thanks for clarifying! I support this feature request and will unassign myself.
Comment From: mroeschke
IMO I think pd.Series(['1', '2', '3', np.nan]).convert_dtypes()
returning string[python]
is appropriate here. The object type contains strings and np.nan
which is the missing value indicator` so string is the appropriate conversion since pandas has a string dtype now. I would be -1 on inferring integer here.
Comment From: phofl
agreed, -1 on converting to int as well. Those are strings, so should keep as is