I'm running Anaconda 64-bit, Python 3.51 (pandas: 0.18.0) and 2.7 (pandas: 0.17.1.)
When I'm in the 2.7 environment, read_excel works properly
import pandas as pd
d=pd.read_excel(XLS1)
*** No CODEPAGE record, no encoding_override: will use 'ascii'
In the 3.5 environment, I get the following error:
import pandas as pd
d=pd.read_excel(XLS1)
*** No CODEPAGE record, no encoding_override: will use 'ascii'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-79b86582347b> in <module>()
----> 1 d=pd.read_excel(XLS1)
/opt/continuum/anaconda3/lib/python3.5/site-packages/pandas/io/excel.py in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, engine, squeeze, **kwds)
167 """
168 if not isinstance(io, ExcelFile):
--> 169 io = ExcelFile(io, engine=engine)
170
171 return io._parse_excel(
/opt/continuum/anaconda3/lib/python3.5/site-packages/pandas/io/excel.py in __init__(self, io, **kwds)
216 self.book = xlrd.open_workbook(file_contents=data)
217 else:
--> 218 self.book = xlrd.open_workbook(io)
219 elif engine == 'xlrd' and isinstance(io, xlrd.Book):
220 self.book = io
/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
439 formatting_info=formatting_info,
440 on_demand=on_demand,
--> 441 ragged_rows=ragged_rows,
442 )
443 return bk
/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/book.py in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
105 "*** Setting on_demand to False.\n")
106 bk.on_demand = on_demand = False
--> 107 bk.fake_globals_get_sheet()
108 elif biff_version == 45:
109 # worksheet(s) embedded in global stream
/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/book.py in fake_globals_get_sheet(self)
685 self._sheet_visibility = [0] # one sheet, visible
686 self._sheet_list.append(None) # get_sheet updates _sheet_list but needs a None beforehand
--> 687 self.get_sheets()
688
689 def handle_boundsheet(self, data):
/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/book.py in get_sheets(self)
676 for sheetno in xrange(len(self._sheet_names)):
677 if DEBUG: print("GET_SHEETS: sheetno =", sheetno, self._sheet_names, self._sh_abs_posn, file=self.logfile)
--> 678 self.get_sheet(sheetno)
679
680 def fake_globals_get_sheet(self): # for BIFF 4.0 and earlier
/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/book.py in get_sheet(self, sh_number, update_pos)
667 sh_number,
668 )
--> 669 sh.read(self)
670 self._sheet_list[sh_number] = sh
671 return sh
/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/sheet.py in read(self, bk)
1473 % (self.number, self.name))
1474 self.tidy_dimensions()
-> 1475 self.update_cooked_mag_factors()
1476 bk._position = oldpos
1477 return 1
/opt/continuum/anaconda3/lib/python3.5/site-packages/xlrd/sheet.py in update_cooked_mag_factors(self)
1541 # VALID, defaults to 60
1542 zoom = 60
-> 1543 elif not (10 <= zoom <= 400):
1544 if blah:
1545 print((
TypeError: unorderable type: int() <= NoneType()
INSTALLED VERSIONS
commit: None python: 3.5.1.final.0 python-bits: 64 OS: Linux OS-release: 3.16.0-38-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.0 nose: 1.3.7 pip: 8.1.1 setuptools: 20.3 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.0 statsmodels: 0.6.1 xarray: None IPython: 4.1.2 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.5.1 pytz: 2016.2 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.5 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.39.0
INSTALLED VERSIONS
commit: None python: 2.7.11.final.0 python-bits: 64 OS: Linux OS-release: 3.16.0-38-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.17.1 nose: 1.3.7 pip: 8.0.2 setuptools: 19.6.2 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.0 statsmodels: 0.6.1 IPython: 4.0.3 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.4.6 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.5.0 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.11 pymysql: None psycopg2: None Jinja2: None
Comment From: jorisvandenbossche
This looks probably like a xlrd issue, but I see that you already reported it there as well: https://github.com/python-excel/xlrd/issues/172. There is also a proposed fix: https://github.com/python-excel/xlrd/pull/131. Maybe you can try that out and report back there?
Anyway, to have a look, you will need to provide a reproducible example, eg by providing an excel file that reproduces the issue or better some code that writes an excel file that reproduces the issue.
Comment From: jreback
closing as not a pandas issue.
Comment From: cstoudt
(Zipped) sample file here:
It is the older XP/2003 format.
read_excel works with Python 2.7.11 from Anaconda 2.5.0 (pandas: 0.17.1, xlrd: 0.9.4, xlwt: 1.0.0)
d=pd.read_excel('mlr02.xls')
*** No CODEPAGE record, no encoding_override: will use 'ascii'
d.describe()
X1 X2 X3
count 11.000000 11.000000 11.000000
mean 150.090909 62.454545 195.000000
std 13.626845 9.114424 17.314734
min 128.000000 46.000000 167.000000
25% 140.000000 56.500000 186.000000
50% 153.000000 64.000000 194.000000
75% 160.500000 69.500000 209.000000
max 168.000000 74.000000 220.000000
It fails with Python 3.5.1 from Anaconda 2.5.0 (pandas: 0.18.0, xlrd: 0.9.4, xlwt: 1.0.0). The difference seems to be the version of pandas. xlrd and xlwt are the same version in both environments. If there is a problem with xlrd, then perhaps it is related to Python 3.
The sample file can be opened with LibreOffice Calc. If I save the file in the xlsx format (Excel 2007/2010/2013 XML) then read_excel works in both Python 2 and Python 3 environments.
I see no fix in the link you posted.
Comment From: jreback
@cstoudt you answered your own question, this is an older format. Further it looks like a py2/py3 issue.
Comment From: jorisvandenbossche
@cstoudt As I said above, there is a proposed fix: https://github.com/python-excel/xlrd/pull/131 It's only a one line change, so you could easily test this by changing that line in the xlrd source code