From https://github.com/pandas-dev/pandas/issues/32271, when setting with enlargment, the dtype gets converted into object dtype (while for normal integer or boolean dtype this is not the case):
In [10]: s = pd.Series([1, 2, 3], dtype="Int64")
In [11]: s[3] = 4
In [12]: s
Out[12]:
0 1
1 2
2 3
3 4
dtype: object
In [13]: s = pd.Series([1, 2, 3], dtype="int64")
In [14]: s[3] = 4
In [15]: s
Out[15]:
0 1
1 2
2 3
3 4
dtype: int64
It also happens with eg DecimalArray from our tests, so suppose it is a general issue with ExtensionArrays.
Comment From: phofl
This works now. Returns:
0 1
1 2
2 3
3 4
dtype: Int64
Comment From: kasim95
take
Comment From: kasim95
@phofl I found the same issue still occurring for other dtypes apart from Int64. I used the following code:
In [1]: dtype = "Int32"
result = Series([1, 2, 3], dtype=dtype)
previous_dtype = result.dtype
result.loc[3] = 4
print(f"{previous_dtype} -> {result.dtype}")
Out [1]:Int32 -> Int64
My observations for some of the inconsistent dtype conversions are as follows:
Before Assignment | After Assignment |
---|---|
Int32 | Int64 |
Int16 | Int64 |
Int8 | Int64 |
UInt64 | Float64 |
UInt32 | Int64 |
UInt16 | Int64 |
UInt8 | Int64 |
Float64 | object |
Float32 | object |
Float16 (float16) | float64 |
string | object |
Also, these inconsistent conversion only occur for assignments to indexes that do not exist in the original array.
For example, in the above code, index 3
does not exist in the result
array.
Is this behavior deliberate or is it a bug?
Comment From: phofl
This seems odd, but not sure if this is deliberate. This happens in https://github.com/pandas-dev/pandas/blob/9415d10f2e876c4bb2d3feb89ee6e1ef43c003cd/pandas/core/indexing.py#L1890
Maybe we should try to preserve the dtype? @jbrockmendel thoughts?
Comment From: jbrockmendel
Maybe we should try to preserve the dtype?
Agreed.
Comment From: phofl
I looked into this. This seems deliberate for now.
# for now only handle other floating types
if not all(isinstance(t, FloatingDtype) for t in dtypes):
return None
This results in object for Float. The int case is the same as for non nullable dtypes. int32 is also changed to int64
Comment From: jbrockmendel
id try to avoid object wherever possible. maybe instead of requiring all-FloatingDtype could require all losslessly-castable, so would include numpy floating dtypes and some numpy integer dtypes
Comment From: simonjayhawkins
This works now. Returns:
0 1 1 2 2 3 3 4 dtype: Int64
but not for the pd.NA case
# gets upcast to object
df = pd.DataFrame({"a": [1, 2, 3]}, dtype="Int64")
df.loc[4] = pd.NA
xref https://github.com/pandas-dev/pandas/issues/47214#issuecomment-1149844764
Comment From: jorisvandenbossche
That last one is actually a regression (I ran into this a few days ago and was planning to open a new issue).
This was working in 1.3 (and already in 1.1 as well):
In [1]: df = pd.DataFrame({"a": [1, 2, 3]}, dtype="Int64")
...: df.loc[4] = pd.NA
In [2]: df
Out[2]:
a
0 1
1 2
2 3
4 <NA>
In [3]: df.dtypes
Out[3]:
a Int64
dtype: object
In [4]: pd.__version__
Out[4]: '1.3.5'
while on 1.4 / main, this results in object dtype
Comment From: simonjayhawkins
Thanks @jorisvandenbossche will open a dedicated issue for the regression.
Comment From: simonjayhawkins
will open a dedicated issue for the regression.
opened #47284
Comment From: phofl
Works now
Comment From: jorisvandenbossche
There was a specific PR for this that added tests (https://github.com/pandas-dev/pandas/pull/47342), so just closing.
Comment From: jorisvandenbossche
Actually, we still have the issue for EAs in general (for external EAs, how they can preserve the dtype / recognize scalars), as I mentioned in the top post this also happens for our test DecimalArray. And this still doesn't work:
In [10]: from pandas.tests.extension.decimal.array import DecimalArray, make_data
In [11]: s = pd.Series(DecimalArray(make_data()))[:3]
In [12]: s
Out[12]:
0 Decimal: 0.47855406981399450927483485429547727...
1 Decimal: 0.35287592203064421791935956207453273...
2 Decimal: 0.09288530130514716098844019143143668...
dtype: decimal
In [13]: s[3] = s[0]
In [14]: s
Out[14]:
0 0.47855406981399450927483485429547727108001708...
1 0.35287592203064421791935956207453273236751556...
2 0.09288530130514716098844019143143668770790100...
3 0.47855406981399450927483485429547727108001708...
dtype: object
So we can keep this open for ExtensionArrays in general.