Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import numpy as np
>>> import pandas as pd
>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4], 'X': pd.Series([np.nan, np.nan], dtype=float)})
>>> 
>>> df1 = df1.set_index(['A', 'X'], drop=True)
>>> 
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'X': pd.Series([np.nan, np.nan], dtype=float)})
>>> df2 = df2.set_index(['A', 'X'], drop=True)
>>> 
>>> df1.combine_first(df2).index.dtypes
A     int64
X    object
dtype: object

Issue Description

df1.combine_first(df2) produces a new index with dtype object when both df1 and df2 use a MultiIndex where one of the dimensions contains all-nan values (in the example column "X").

It's expected that after combine_first() the original dtype of the index dimensions remains the same (in particular when both dataframes involved had the same dtype to begin with).

Note: this doesn't happen for a single-dimension index:

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4], 'X': pd.Series([np.nan, np.nan], dtype=float)})
>>> df1 = df1.set_index(['X'], drop=True)
>>> 
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'X': pd.Series([np.nan, np.nan], dtype=float)})
>>> df2 = df2.set_index(['X'], drop=True)
>>> 
>>> df1.combine_first(df2).index.dtype
dtype('float64')

or if one of the (all-nan) index columns contains a value that is different to nan:

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4], 'X': pd.Series([np.nan, 1.0], dtype=float)})
>>> 
>>> df1 = df1.set_index(['A', 'X'], drop=True)
>>> 
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'X': pd.Series([np.nan, np.nan], dtype=float)})
>>> df2 = df2.set_index(['A', 'X'], drop=True)
>>> 
>>> df1.combine_first(df2).index.dtypes
A      int64
X    float64   <---- expected result even for df1 = pd.DataFrame({... 'X': pd.Series([np.nan, np.nan], dtype=float)})
dtype: object

This issue persists for Pandas 1.5.0, 1.5.1 and 2.0.0.dev0+724.ga6d3e13b3.

Expected Behavior

No change in dtype ("X" is of type float, not object)

Installed Versions

INSTALLED VERSIONS ------------------ commit : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763 python : 3.8.15.final.0 python-bits : 64 OS : Linux OS-release : 6.0.8-300.fc37.x86_64 Version : #1 SMP PREEMPT_DYNAMIC Fri Nov 11 15:09:04 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_AU.UTF-8 LOCALE : en_AU.UTF-8 pandas : 1.5.0 numpy : 1.23.2 pytz : 2020.4 dateutil : 2.8.1 setuptools : 59.6.0 pip : 22.3.1 Cython : 0.29.32 pytest : 7.1.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 0.9.6 lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.8.6 jinja2 : 2.11.2 IPython : None pandas_datareader: None bs4 : None bottleneck : 1.3.5 brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : 2.8.1 odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : 1.0.1 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.4.1 snappy : None sqlalchemy : 1.3.23 tables : 3.7.0 tabulate : None xarray : None xlrd : 2.0.1 xlwt : None zstandard : None tzdata : None

Comment From: phofl

Hi thanks for your report. This comes from MultiIndex.join

Comment From: ssche

Oh thank you so much for the PR - I really do appreciate it!