Pandas BUG: assignment with enlargement gives object dtype with ExtensionArrays

From https://github.com/pandas-dev/pandas/issues/32271, when setting with enlargment, the dtype gets converted into object dtype (while for normal integer or boolean dtype this is not the case):

In [10]: s = pd.Series([1, 2, 3], dtype="Int64")  

In [11]: s[3] = 4 

In [12]: s   
Out[12]: 
0    1
1    2
2    3
3    4
dtype: object

In [13]: s = pd.Series([1, 2, 3], dtype="int64") 

In [14]: s[3] = 4   

In [15]: s  
Out[15]: 
0    1
1    2
2    3
3    4
dtype: int64

It also happens with eg DecimalArray from our tests, so suppose it is a general issue with ExtensionArrays.

Comment From: phofl

This works now. Returns:

0    1
1    2
2    3
3    4
dtype: Int64

Comment From: kasim95

take

Comment From: kasim95

@phofl I found the same issue still occurring for other dtypes apart from Int64. I used the following code:

In [1]: dtype = "Int32"
        result = Series([1, 2, 3], dtype=dtype)
        previous_dtype = result.dtype
        result.loc[3] = 4
        print(f"{previous_dtype} -> {result.dtype}")

Out [1]:Int32 -> Int64

My observations for some of the inconsistent dtype conversions are as follows:

Before Assignment	After Assignment
Int32	Int64
Int16	Int64
Int8	Int64
UInt64	Float64
UInt32	Int64
UInt16	Int64
UInt8	Int64
Float64	object
Float32	object
Float16 (float16)	float64
string	object

Also, these inconsistent conversion only occur for assignments to indexes that do not exist in the original array. For example, in the above code, index 3 does not exist in the result array. Is this behavior deliberate or is it a bug?

Comment From: phofl

This seems odd, but not sure if this is deliberate. This happens in https://github.com/pandas-dev/pandas/blob/9415d10f2e876c4bb2d3feb89ee6e1ef43c003cd/pandas/core/indexing.py#L1890

Maybe we should try to preserve the dtype? @jbrockmendel thoughts?

Comment From: jbrockmendel

Maybe we should try to preserve the dtype?

Agreed.

Comment From: phofl

I looked into this. This seems deliberate for now.

# for now only handle other floating types
if not all(isinstance(t, FloatingDtype) for t in dtypes):
    return None

This results in object for Float. The int case is the same as for non nullable dtypes. int32 is also changed to int64

Comment From: jbrockmendel

id try to avoid object wherever possible. maybe instead of requiring all-FloatingDtype could require all losslessly-castable, so would include numpy floating dtypes and some numpy integer dtypes

Comment From: simonjayhawkins

This works now. Returns:

0 1 1 2 2 3 3 4 dtype: Int64

but not for the pd.NA case

# gets upcast to object
df = pd.DataFrame({"a": [1, 2, 3]}, dtype="Int64")
df.loc[4] = pd.NA

xref https://github.com/pandas-dev/pandas/issues/47214#issuecomment-1149844764

Comment From: jorisvandenbossche

That last one is actually a regression (I ran into this a few days ago and was planning to open a new issue).

This was working in 1.3 (and already in 1.1 as well):

In [1]: df = pd.DataFrame({"a": [1, 2, 3]}, dtype="Int64")
   ...: df.loc[4] = pd.NA

In [2]: df
Out[2]: 
      a
0     1
1     2
2     3
4  <NA>

In [3]: df.dtypes
Out[3]: 
a    Int64
dtype: object

In [4]: pd.__version__
Out[4]: '1.3.5'

while on 1.4 / main, this results in object dtype

Comment From: simonjayhawkins

Thanks @jorisvandenbossche will open a dedicated issue for the regression.

Comment From: simonjayhawkins

will open a dedicated issue for the regression.

opened #47284

Comment From: phofl

Works now

Comment From: jorisvandenbossche

There was a specific PR for this that added tests (https://github.com/pandas-dev/pandas/pull/47342), so just closing.

Comment From: jorisvandenbossche

Actually, we still have the issue for EAs in general (for external EAs, how they can preserve the dtype / recognize scalars), as I mentioned in the top post this also happens for our test DecimalArray. And this still doesn't work:

In [10]: from pandas.tests.extension.decimal.array import DecimalArray, make_data

In [11]: s = pd.Series(DecimalArray(make_data()))[:3]

In [12]: s
Out[12]: 
0    Decimal: 0.47855406981399450927483485429547727...
1    Decimal: 0.35287592203064421791935956207453273...
2    Decimal: 0.09288530130514716098844019143143668...
dtype: decimal

In [13]: s[3] = s[0]

In [14]: s
Out[14]: 
0    0.47855406981399450927483485429547727108001708...
1    0.35287592203064421791935956207453273236751556...
2    0.09288530130514716098844019143143668770790100...
3    0.47855406981399450927483485429547727108001708...
dtype: object

So we can keep this open for ExtensionArrays in general.