I noticed xlsx files I was exporting from Pandas with to_excel weren't opening in Numbers and Preview on my Mac. They did, however, open in Excel and Google Drive.
I created a new virtualenv and installed Pandas and openpyxl with pip:
numpy==1.8.1
openpyxl==1.8.6
pandas==0.13.1
python-dateutil==2.2
pytz==2014.2
six==1.6.1
wsgiref==0.1.2
Running this code creates a xlsx file that seems to be ok (opens in Numbers anyway):
>>> import pandas as pd
>>> df = pd.DataFrame.from_records([[1,2],[3,4]], columns=["a","b"])
>>> df.to_excel("without_lxml.xlsx")
However, this same code was creating ill-formed xlsx files in my project. I narrowed the culprit down to lxml. I installed lxml in this fresh venv so pip freeze
gives:
lxml==3.3.5
numpy==1.8.1
openpyxl==1.8.6
pandas==0.13.1
python-dateutil==2.2
pytz==2014.2
six==1.6.1
wsgiref==0.1.2
At that point, I ran the same code and the new file (with_lxml.xlsx) doesn't open in Numbers or Preview.
>>> import pandas as pd
>>> df = pd.DataFrame.from_records([[1,2],[3,4]], columns=["a","b"])
>>> df.to_excel("with_lxml.xlsx")
You can see the Excel files created: - http://stiglerimages.s3.amazonaws.com/dropzone/without_lxml.xlsx - http://stiglerimages.s3.amazonaws.com/dropzone/with_lxml.xlsx
These files can be unzipped and inspected for possible explanations.
Comment From: jreback
hmm...can you try with lxml 3.2.1 (its what we test with)
Comment From: tdhopper
Seems to work with 3.2.1, however to_excel generates this warning:
/Users/tdhopper/.virtualenvs/temp/lib/python2.7/site-packages/openpyxl/__init__.py:31: UserWarning: The installed version of lxml is too old to be used with openpyxl
warnings.warn("The installed version of lxml is too old to be used with openpyxl")
Comment From: cpcloud
@jreback how are the wheels working these days? is there a server that i can log into and build things? i'm happy to set this up if you can't do it.
Comment From: cpcloud
let me back up a bit....should we build with a newer version of lxml
Comment From: jreback
easy enough to do
easy enough....you can certainly create them...i'll have to send you login info to upload separately
see pandas/ci/speedpack it works out pretty well....
Comment From: cpcloud
ok so same as before ...
Comment From: jreback
yep...then they live here: http://pandas.pydata.org/pandas-build/dev/wheels/ (which is served by the pandas.pydata.org server)
Comment From: jreback
@cpcloud I emailed you the machine/pw...
Comment From: jreback
@tdhopper ok...seems that something changed in newer version of lxml....want to debug this and do a pull-request?
Comment From: tdhopper
Don't have the time or understanding of the internals to work on it at the moment. I'll see if I can take a look over the next few weeks if it doesn't get fix before that.
Comment From: jreback
@tdhopper that's fine; we are going to be release 0.14.0 shortly anyhow. marking for 0.15.
Comment From: jmcnamara
I don't know the full details here but from my reading of the OpenPyXL mailing list I think what is happening is:
- OpenPyXL recently added support for lxml
as an additional backend for xml handling.
- The OpenPyXL files produced with lxml
installed have a different namespace (or a different arrangement of namespaces). In the files you attached the only real difference is that the relationship
namespace is attached to the sheet
element rather than the workbook
element.
- The result of this is that the files produced aren't compatible with third-party applications such as Numbers.
There was a similar issue reported on the OpenPyXL mailing list where the reply was that the file produced was a valid Excel OpenXML file according to the specification.
So, this isn't an issue in Pandas. You could raise it as an issue with the OpenPyXL devs but it looks like for now it is the accepted behaviour.
As a workaround you could use xlsxwriter
as the Excel output engine:
df.to_excel("with_lxml.xlsx", engine='xlsxwriter')
Comment From: jreback
@cpcloud
will leave this open as a marker for building the wheel for newer version of lxml
Comment From: cpcloud
cool
Comment From: cpcloud
@jreback i'm building the local wheels now .... had to reinstall os awhile back and now i'm paying the pandaculous price
Comment From: tdhopper
Thanks @jmcnamara