Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd 
import os

df = pd.DataFrame({("A", 1): [0], ("", 1): [1]})

df.to_excel(os.path.expanduser("merge_empty.xlsx"), merge_cells=True)

Issue Description

When an empty string is present in a MultiIndex header, to_excel will treat it as the merge it incorrectly with the previous column entry on that level. In the example above, the first and second columns should not merge because their first level entries are different.

See the screenshot below: Pandas BUG: Excel merge cell does not handle empty strings correctly

The issue arises from the use of an empty string as the sentinel in MultiIndex.format

Expected Behavior

The column header should not merge past empty strings.

See the screenshot below: Pandas BUG: Excel merge cell does not handle empty strings correctly

Installed Versions

C:\Users\\envs\pandas14_cf\lib\site-packages\_distutils_hack\__init__.py:36: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit : d023ba755322e09b95fd954bbdc43f5be224688e python : 3.10.1.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19043 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 1.4.0rc0 numpy : 1.22.0 pytz : 2021.3 dateutil : 2.8.2 pip : 21.2.4 setuptools : 60.2.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.0.2 lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 8.0.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.5.1 numexpr : 2.8.0 odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : None xlrd : 2.0.1 xlwt : None numba : None zstandard : None

Comment From: tehunter

The following code snippet should be able to replace level_lengths so that to_excel handles empty strings correctly

(In pandas.io.formats.excel)

class ExcelFormatter:
    # ...
    def _format_header_mi(...):
        # ...
        level_lengths = pandas.io.formats.format.get_level_lengths([[False] + [lev_codes[i] == lev_codes[i+1] for i in range(len(lev_codes)-1)] for lev,lev_codes in zip(columns.levels, columns.codes)], sentinel=True)

Rather than getting level lengths based on the output of columns.format, it gets the level lengths from the column codes directly.

I haven't tested multi-index rows, but it's probably the same problem and solution.

Comment From: tehunter

take

Comment From: tehunter

Looking into this more, there's more ramifications than expected. The problem I discussed above references the behavior when writing. However, similar behavior exists when reading.

For the following Excel sheet, the reader calls a fill_mi_header function which will replace empty header cells with the preceding non-empty value. In other words, it will be read in as DataFrame({("A", "1"): [0], ("A", "1"): [0]}) Pandas BUG: Excel merge cell does not handle empty strings correctly

If I fix the problem addressed in the original issue, then reading the exported sheet will not return the same dataframe as the one that was written. Should the read and write functions have options on how to handle empty string header cells?

Comment From: ariabtsev

Hello, any update on this bug?