Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

wb = load_workbook(self._get_report_filename_template())
wb.create_sheet(self.report_range_sheet)
with pd.ExcelWriter(file, engine='openpyxl') as writer:
    writer.book = wb  # This assignment causes AttributeError: can't set attribute

Issue Description

Assignment to writer.book causes an exception:

Traceback (most recent call last): File "/app/py_code/Bussiness/Reports/PrehledovyReport.py", line 108, in _export_data writer.book = wb AttributeError: can't set attribute

downgrading to 1.4.4 resolves the issue

Expected Behavior

The assignment should work - injecting externally created workbook to the writer

Installed Versions

pd.show_versions() returns None ... presenting the requirements.txt instead

amqp==5.1.1 asgiref==3.5.2 astroid==2.12.10 asttokens==2.0.8 async-timeout==4.0.2 backports.zoneinfo==0.2.1 beautifulsoup4==4.11.1 billiard==3.6.4.0 bs4==0.0.1 celery==5.2.7 certifi==2022.9.14 chardet==5.0.0 charset-normalizer==2.1.1 click==8.1.3 click-didyoumean==0.3.0 click-plugins==1.1.1 click-repl==0.2.0 colorama==0.4.5 convertdate==2.4.0 DateTime==4.7 decorator==5.1.1 Deprecated==1.2.13 dill==0.3.5.1 Django==4.1.1 django-admin-rangefilter==0.8.8 django-appconf==1.0.5 django-celery-results==2.4.0 django-compat==1.0.15 django-debug-toolbar==3.6.0 django-imagekit==4.1.0 django-pandas==0.6.6 django-request-logging==0.7.5 django-sendgrid-v5==1.2.1 et-xmlfile==1.1.0 future==0.18.2 geocoder==1.38.1 greenlet==1.1.3 icontract==2.6.2 idna==3.4 isort==5.10.1 jdcal==1.4.1 Jinja2==3.1.2 jplephem==2.17 kombu==5.2.4 lazy-object-proxy==1.7.1 lunardate==0.2.0 MarkupSafe==2.1.1 mccabe==0.7.0 numpy==1.23.3 openpyxl==3.0.10 packaging==21.3 pandas==1.5.0 pilkit==2.0 Pillow==9.2.0 platformdirs==2.5.2 prompt-toolkit==3.0.31 psycopg2==2.9.3 pyCalverter==1.6.1 pylint==2.15.3 pylint-django==2.5.3 pylint-plugin-utils==0.7 pylint-unittest==0.1.3 pyluach==2.0.1 PyMeeus==0.5.11 pyparsing==3.0.9 pypng==0.20220715.0 PyQRCode==1.2.1 python-dateutil==2.8.2 python-http-client==3.3.7 pytz==2022.2.1 PyYAML==6.0 ratelim==0.1.6 redis==4.3.4 requests==2.28.1 sendgrid==6.9.7 sgp4==2.21 six==1.16.0 skyfield==1.45 skyfield-data==4.0.0 soupsieve==2.3.2.post1 SQLAlchemy==1.4.40 sqlparse==0.4.2 starkbank-ecdsa==2.0.3 toml==0.10.2 tomli==2.0.1 tomlkit==0.11.4 typing_extensions==4.3.0 tzdata==2022.2 urllib3==1.26.12 vine==5.0.0 wcwidth==0.2.5 workalendar==16.4.0 wrapt==1.14.1 zope.interface==5.4.0

Comment From: phofl

Hi, thanks for your report.

All ExcelWriter attributes were considered private prior to 1.5, but this was not enforced.

We made some attributes public in 1.5, but put some restrictions on what you can do with them. For example, you can not set a book.

cc @rhshadrach to double check

Comment From: vranoch

Hi, thanks for your report.

All ExcelWriter attributes were considered private prior to 1.5, but this was not enforced.

We made some attributes public in 1.5, but put some restrictions on what you can do with them. For example, you can not set a book.

cc @rhshadrach to double check

Hi, Thanks for Your reply, but in that case is there any legal way how to "extend" existing workbook? Without anything else, .to_excel() just overwrites given file with completely new content. The used way was to load existing workbook, create the excel writer and inject previosly loaded workbook in it. In this case all existing content of the file was preserved and new content was added.

How to achieve this functionality?

Comment From: phofl

Could you make your example copy and passable?

You get a Workbook object from the respective engine, e.g. openpyxl. This object has methods again that you can use

Comment From: vranoch

The following code should be reproducible (with any existing Excel file):

` import pandas as pd from openpyxl import load_workbook

wb = load_workbook("any_existing_nonempty_workbook.xlsx") wb.create_sheet("range") with pd.ExcelWriter(file, engine='openpyxl') as writer: writer.book = wb `

Comment From: tehunter

I use this feature too to extend existing workbooks:

        # pandas 1.4.x; haven't tried it in 1.5 yet
        from openpyxl import load_workbook
        if (template_path and os.path.normpath(template_path) != os.path.normpath(path)):
            self.book = load_workbook(template_path)
            self.handles.handle.seek(0)
            self.sheets = {name: self.book[name] for name in self.book.sheetnames}

It's similar to using the r+ mode, but it allows you to save to a different file than the template you are loading from.

Comment From: rhshadrach

Thanks for the report!

We made some attributes public in 1.5, but put some restrictions on what you can do with them. For example, you can not set a book.

In implementing this, we deprecated certain attributes because while they were documented as not being public, it was known that many attributes were being used by users. I think that should be the approach here - revert this in 1.5.1 allowing book to be a settable property.

However the issue with assigning a new book is that the book is stateful. Currently openpyxl doesn't use this state when writing as far as I can tell, but xlsxwriter does.

from xlsxwriter import Workbook

df = pd.DataFrame({'a': [1, 1, 2]})
df2 = pd.DataFrame({'a': [3, 3, 4]})
df.to_excel('test.xlsx', engine='xlsxwriter')

wb = Workbook('test.xlsx')

with pd.ExcelWriter('test2.xlsx', engine='xlsxwriter') as writer:
    # This is the equivalent of `writer.book = ...` in pandas 1.4.4
    writer._book = wb
    df2.to_excel(writer)

This produces test2.xlsx, but it is an empty file. It also overwrites Sheet 1 rather than adding a new sheet to test.xlsx (is this the correct behavior?). This is because unlike openpyxl, xlsxwriter uses the filename within the book when writing. It seems likely to me (but I'm not certain) that we can remove all reliance on state between method calls - i.e. any needed state within a method would be taken directly from the book or the arguments passed to ExcelWriter. I believe this would enable the setting of book safely, but maybe there are still cases of bad behavior.

Comment From: vranoch

Hi Richard,

generally, the goal is to use existing Excel file as a template and typically add new sheets to it or write new data to existing sheets.

I understand that assigning the whole workbook anytime may be dangerous. I believe that either exposing specific "setter" method that woud correctly handle setting the object with all relevant relations - or even allowing to specify the "template" workbook only within the initialization of the writer (instead of internally setting a new workbook) would do the job.

Thanks Jan

Comment From: phofl

Imo this is something that should be handled on the engine level, not on our side. For example:

book = writer.book
book.add_sheets(...)

This would move the functionality to openpyxl or another engine

Comment From: vranoch

Yes, why not, but is there any way how to initialize the engine (writer.book) with an EXISTING workbook with some content, not a newly created empty one?

Comment From: phofl

What you can do is specifying mode="a" when creating the ExcelWriter and use an existing Workbook as source, this should enable you to append to an existing workbook.

Comment From: vranoch

So I should first make a copy of the original "template" file and then put this file to the writer with mode="a"?

Comment From: rhshadrach

@phofl - you agree with allowing book to be settable for the time being? My thinking is that this was an unintended API change that should have gone through deprecation.

@vranoch - can you confirm if the workaround of using mode="a" satisfies your use case?

Comment From: phofl

I would lean to not rolling this back, since they were private and hence we can change them if we want.

If you feel strongly about it, I would be ok with rolling back and adding an immediate deprecation warning, that we can remove in 2.0

Comment From: vranoch

Hi Richard,

Yes, the mode='a' seems to work:

`def _export_data(self, data, file, date_from, date_to):

with open(self._get_report_filename_template(), "rb") as report_template_file:
    self._copy_filelike_to_filelike(report_template_file, file)

with pd.ExcelWriter(file, mode='a') as writer:
    wb = writer.book

    try:
        del wb[self.report_data_sheet]
    except Exception:
        pass

    try:
        del wb[self.report_range_sheet]
    except Exception:
        pass

    wb.create_sheet(self.report_range_sheet)
    data.to_excel(writer, sheet_name=self.report_data_sheet, startrow=data_row_offset, startcol=data_col_offset, index=False)`

Thanks

Comment From: twoertwein

If we add a setter for book, it would be good to emit a warning. I think we had multiple issues where people overwrote book and then were surprised that it corrupted their excel file (because pandas uses its internal file handle that it passed to the excel engine in the first place).

Comment From: rhshadrach

I would lean to not rolling this back, since they were private and hence we can change them if we want.

If you feel strongly about it, I would be ok with rolling back and adding an immediate deprecation warning, that we can remove in 2.0

Thanks @phofl. In #45795 I deprecated other attributes because we were aware of their usage; but I wasn't aware users were setting this attribute. Had I been, this would have been deprecated as well. As such, we broken user code, so it seems best to me to revert and deprecate. I've opened #48943.

Comment From: MarcoGorelli

Hey @vranoch

Thanks for having reported this - there's now a release candidate for pandas 2.0.0, which you can install with

mamba install -c conda-forge/label/pandas_rc pandas==2.0.0rc0

or

python -m pip install --upgrade --pre pandas==2.0.0rc0

If you try it out and report bugs, then we can fix them before the final 2.0.0 release