Pandas Assigning column with DataFrame(dtype='category') doesn't preserve categorical dtype, but Series does

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame([1,2,3])
df['col1'] = pd.DataFrame([1,2,3], dtype='category')
df['col2'] = pd.Series([1,2,3], dtype='category')
df.dtypes

Problem description

This returns

0          int64
col1       int64
col2    category
dtype: object

Expected Output

0          int64
col1    category
col2    category
dtype: object

Output of `pd.show_versions()`

[paste the output of ``pd.show_versions()`` here below this line] INSTALLED VERSIONS ------------------ commit : d9f028dba97e9a3f21077ceeb96ca6552909c3b3 python : 3.7.6.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-74-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8 pandas : 0.26.0.dev0+2031.gd9f028dba.dirty numpy : 1.17.3 pytz : 2019.3 dateutil : 2.8.1 pip : 19.3.1 setuptools : 45.0.0.post20200113 Cython : 0.29.14 pytest : 5.3.3 hypothesis : 5.1.5 sphinx : 2.3.1 blosc : None feather : None xlsxwriter : 1.2.7 lxml.etree : 4.4.2 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.11.1 pandas_datareader: None bs4 : 4.8.2 bottleneck : 1.3.1 fastparquet : 0.3.2 gcsfs : None lxml.etree : 4.4.2 matplotlib : 3.1.2 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.1 pandas_gbq : None pyarrow : 0.15.1 pytables : None pytest : 5.3.3 pyxlsb : None s3fs : 0.4.0 scipy : 1.4.1 sqlalchemy : 1.3.12 tables : 3.6.1 tabulate : 0.8.6 xarray : 0.14.1 xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.7 numba : 0.47.0

Comment From: DanielBustillos

Hi, I have checked the error, can I work on it?

Comment From: MarcoGorelli

Hi @DanielBustillos - sure, pull requests are welcome! See the contributing guide for how to get started

Comment From: jbrockmendel

Is it obvious that this should be allowed? I would almost expect setting a DataFrame into a single column to raise cc @phofl

if this is something we want to allow, someone can salvage this from an old branch im trimming:

+++ b/pandas/core/frame.py
@@ -4076,7 +4076,10 @@ class DataFrame(NDFrame, OpsMixin):
                 f"column {key}"
             )

-        self[key] = value[value.columns[0]]
+        # now align rows
+        # TODO: could lose dtypes if multi-column xref #31581?
+        arraylike = _reindex_for_setitem(value, self.index)
+        self._set_item_mgr(key, arraylike)

     def _iset_item_mgr(
         self, loc: int | slice | np.ndarray, value, inplace: bool = False

@@ -11576,6 +11579,10 @@ def _from_nested_dict(data) -> collections.defaultdict:
 def _reindex_for_setitem(value: DataFrame | Series, index: Index) -> ArrayLike:
     # reindex if necessary

+    if isinstance(value, DataFrame) and value.shape[1] == 1:
+        # GH#31581 avoid losing dtype for EAs
+        value = value._ixs(0, axis=1)
+
     if value.index.equals(index) or not len(index):
         return value._values.copy()


+++ b/pandas/tests/frame/indexing/test_setitem.py
@@ -44,6 +44,17 @@ from pandas.tseries.offsets import BDay

 class TestDataFrameSetItem:
+    def test_setitem_categorical_frame(self):
+        # GH#31581 don't lose categorical dtype when setting column
+        df = DataFrame([1, 2, 3])
+
+        value = pd.DataFrame([1, 2, 3], dtype="category")
+
+        df["col1"] = value
+
+        expected = DataFrame({0: [1, 2, 3], "col1": value[0]})
+        tm.assert_frame_equal(df, expected)
+
     def test_setitem_str_subclass(self):

Alternatively could make DataFrame._values return a 2D categorical.

Comment From: phofl

I think this could work ( from a user pov).

it actually does on main, returned type is category