As far as I can tell, both methods of reading an Excel sheet (through ExcelFile and read_excel) do not have the option to avoid mangling duplicate columns, which exists for read_csv (https://github.com/pydata/pandas/issues/3468).
I have an Excel file that looks like this:
foo foo bar bar baz baz
A B A B A B
123 456 789 12 345 678
901 234 567 890 123 456
789 12 345 678 901 234
I initially encountered this problem while trying to find a workaround for not being able to specify a multirow-header (#4679). Reading in this Excel file with header = 0 (default) mangles the column names:
>>> df = pd.read_excel("dupe_cols.xlsx", header = 0)
>>> print df
foo foo.1 bar bar.1 baz baz.1
0 A B A B A B
1 123 456 789 12 345 678
2 901 234 567 890 123 456
3 789 12 345 678 901 234
Unlike read_csv
, read_excel
does not have the option to avoid mangling duplicate columns (using ExcelFile.parse works the same as far as I can see). Specifying header = None in read_excel
and then assigning the column names to the first row will effectively allow you to avoid mangling the column names. You could also read in the Excel sheet with header = None, save a .csv with header = False and index = False, then read that csv and specify mangle_dupe_cols = False to get the dataframe you want. (Incidentally, this also allows you to specify multiple rows as the header, which is the behavior I was originally trying to emulate.)
Comment From: jreback
can you post a small example that can be used as a tests in the top section (e.g. create a frame, write to excel), show the arguments used to read back
Comment From: gfyoung
Because we pass in **kwds
from read_excel
to TextFileReader
, we actually do accept mangle_dupe_cols
now. 🎉 We just need to surface it explicitly.
Otherwise, this issue is superseded by #13262.
Comment From: geminizb
I'm afraid ExcelFile.parse is not changed yet
Comment From: dhimmel
With pandas 1.0.3 I got Setting mangle_dupe_cols=False is not supported yet
.
~/anaconda3/lib/python3.6/site-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
332 convert_float=convert_float,
333 mangle_dupe_cols=mangle_dupe_cols,
--> 334 **kwds,
335 )
336
~/anaconda3/lib/python3.6/site-packages/pandas/io/excel/_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
886 convert_float=convert_float,
887 mangle_dupe_cols=mangle_dupe_cols,
--> 888 **kwds,
889 )
890
~/anaconda3/lib/python3.6/site-packages/pandas/io/excel/_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
510 usecols=usecols,
511 mangle_dupe_cols=mangle_dupe_cols,
--> 512 **kwds,
513 )
514
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in TextParser(*args, **kwds)
2199 """
2200 kwds["engine"] = "python"
-> 2201 return TextFileReader(*args, **kwds)
2202
2203
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
865 self._currow = 0
866
--> 867 options = self._get_options_with_defaults(engine)
868
869 self.chunksize = options.pop("chunksize", None)
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _get_options_with_defaults(self, engine)
893 # see gh-12935
894 if argname == "mangle_dupe_cols" and not value:
--> 895 raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
896 else:
897 options[argname] = value
ValueError: Setting mangle_dupe_cols=False is not supported yet
Comment From: talatccan
I got ValueError: Setting mangle_dupe_cols=False is not supported yet
for pandas version 1.0.4
Code:
df = pd.read_excel(DATA_PATH, mangle_dupe_cols=False)
Error Details:
File "site-packages\pandas\io\excel\_base.py", line 334, in read_excel
**kwds,
File "site-packages\pandas\io\excel\_base.py", line 888, in parse
**kwds,
File "site-packages\pandas\io\excel\_base.py", line 512, in parse
**kwds,
File "site-packages\pandas\io\parsers.py", line 2201, in TextParser
return TextFileReader(*args, **kwds)
File "site-packages\pandas\io\parsers.py", line 867, in __init__
options = self._get_options_with_defaults(engine)
File "site-packages\pandas\io\parsers.py", line 895, in _get_options_with_defaults
raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
ValueError: Setting mangle_dupe_cols=False is not supported yet
Comment From: ronald8192
Using pandas-1.2.0, got an error: ValueError: Setting mangle_dupe_cols=False is not supported yet
>>> pd.read_excel('test.xlsx',mangle_dupe_cols=False)
<stdin>:1: FutureWarning: Your version of xlrd is 1.2.0. In xlrd >= 2.0, only the xls format is supported. As a result, the openpyxl engine will be used if it is installed and the engine argument is not specified. Install openpyxl instead.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/util/_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 344, in read_excel
data = io.parse(
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 1153, in parse
return self._reader.parse(
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 532, in parse
parser = TextParser(
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 2224, in TextParser
return TextFileReader(*args, **kwds)
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 801, in __init__
options = self._get_options_with_defaults(engine)
File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 829, in _get_options_with_defaults
raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
ValueError: Setting mangle_dupe_cols=False is not supported yet