Pandas read_excel: fails for Python 3.5.1/Pandas 0.18.0

I'm running Anaconda 64-bit, Python 3.51 (pandas: 0.18.0) and 2.7 (pandas: 0.17.1.)

When I'm in the 2.7 environment, read_excel works properly

import pandas as pd
d=pd.read_excel(XLS1)
*** No CODEPAGE record, no encoding_override: will use 'ascii'

In the 3.5 environment, I get the following error:

import pandas as pd
d=pd.read_excel(XLS1)
*** No CODEPAGE record, no encoding_override: will use 'ascii'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-79b86582347b> in <module>()
----> 1 d=pd.read_excel(XLS1)

/opt/continuum/anaconda3/lib/python3.5/site-packages/pandas/io/excel.py in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, engine, squeeze, **kwds)
    167     """
    168     if not isinstance(io, ExcelFile):
--> 169         io = ExcelFile(io, engine=engine)
    170 
    171     return io._parse_excel(

/opt/continuum/anaconda3/lib/python3.5/site-packages/pandas/io/excel.py in __init__(self, io, **kwds)
    216                 self.book = xlrd.open_workbook(file_contents=data)
    217             else:
--> 218                 self.book = xlrd.open_workbook(io)
    219         elif engine == 'xlrd' and isinstance(io, xlrd.Book):
    220             self.book = io

/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
    439         formatting_info=formatting_info,
    440         on_demand=on_demand,
--> 441         ragged_rows=ragged_rows,
    442         )
    443     return bk

/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/book.py in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
    105                     "*** Setting on_demand to False.\n")
    106                 bk.on_demand = on_demand = False
--> 107             bk.fake_globals_get_sheet()
    108         elif biff_version == 45:
    109             # worksheet(s) embedded in global stream

/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/book.py in fake_globals_get_sheet(self)
    685         self._sheet_visibility = [0] # one sheet, visible
    686         self._sheet_list.append(None) # get_sheet updates _sheet_list but needs a None beforehand
--> 687         self.get_sheets()
    688 
    689     def handle_boundsheet(self, data):

/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/book.py in get_sheets(self)
    676         for sheetno in xrange(len(self._sheet_names)):
    677             if DEBUG: print("GET_SHEETS: sheetno =", sheetno, self._sheet_names, self._sh_abs_posn, file=self.logfile)
--> 678             self.get_sheet(sheetno)
    679 
    680     def fake_globals_get_sheet(self): # for BIFF 4.0 and earlier

/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/book.py in get_sheet(self, sh_number, update_pos)
    667                 sh_number,
    668                 )
--> 669         sh.read(self)
    670         self._sheet_list[sh_number] = sh
    671         return sh

/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/sheet.py in read(self, bk)
   1473                 % (self.number, self.name))
   1474         self.tidy_dimensions()
-> 1475         self.update_cooked_mag_factors()
   1476         bk._position = oldpos
   1477         return 1

/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/sheet.py in update_cooked_mag_factors(self)
   1541                 # VALID, defaults to 60
   1542                 zoom = 60
-> 1543             elif not (10 <= zoom <= 400):
   1544                 if blah:
   1545                     print((

TypeError: unorderable type: int() <= NoneType()

INSTALLED VERSIONS

commit: None python: 3.5.1.final.0 python-bits: 64 OS: Linux OS-release: 3.16.0-38-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.18.0 nose: 1.3.7 pip: 8.1.1 setuptools: 20.3 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.0 statsmodels: 0.6.1 xarray: None IPython: 4.1.2 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.5.1 pytz: 2016.2 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.5 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.39.0

INSTALLED VERSIONS

commit: None python: 2.7.11.final.0 python-bits: 64 OS: Linux OS-release: 3.16.0-38-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.17.1 nose: 1.3.7 pip: 8.0.2 setuptools: 19.6.2 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.0 statsmodels: 0.6.1 IPython: 4.0.3 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.4.6 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.5.0 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.11 pymysql: None psycopg2: None Jinja2: None

Comment From: jorisvandenbossche

This looks probably like a xlrd issue, but I see that you already reported it there as well: https://github.com/python-excel/xlrd/issues/172. There is also a proposed fix: https://github.com/python-excel/xlrd/pull/131. Maybe you can try that out and report back there?

Anyway, to have a look, you will need to provide a reproducible example, eg by providing an excel file that reproduces the issue or better some code that writes an excel file that reproduces the issue.

Comment From: jreback

closing as not a pandas issue.

Comment From: cstoudt

(Zipped) sample file here:

mlr02.xls.zip

It is the older XP/2003 format.

read_excel works with Python 2.7.11 from Anaconda 2.5.0 (pandas: 0.17.1, xlrd: 0.9.4, xlwt: 1.0.0)

d=pd.read_excel('mlr02.xls')
*** No CODEPAGE record, no encoding_override: will use 'ascii'

d.describe()

               X1         X2          X3
count   11.000000  11.000000   11.000000
mean   150.090909  62.454545  195.000000
std     13.626845   9.114424   17.314734
min    128.000000  46.000000  167.000000
25%    140.000000  56.500000  186.000000
50%    153.000000  64.000000  194.000000
75%    160.500000  69.500000  209.000000
max    168.000000  74.000000  220.000000

It fails with Python 3.5.1 from Anaconda 2.5.0 (pandas: 0.18.0, xlrd: 0.9.4, xlwt: 1.0.0). The difference seems to be the version of pandas. xlrd and xlwt are the same version in both environments. If there is a problem with xlrd, then perhaps it is related to Python 3.

The sample file can be opened with LibreOffice Calc. If I save the file in the xlsx format (Excel 2007/2010/2013 XML) then read_excel works in both Python 2 and Python 3 environments.

I see no fix in the link you posted.

Comment From: jreback

@cstoudt you answered your own question, this is an older format. Further it looks like a py2/py3 issue.

Comment From: jorisvandenbossche

@cstoudt As I said above, there is a proposed fix: https://github.com/python-excel/xlrd/pull/131 It's only a one line change, so you could easily test this by changing that line in the xlrd source code