A small, complete example of the issue
a ,a ,a
1,2,3
4,5,6
7,8,9
note the white space next to the char.
>>> import pandas
>>> df = pandas.read_table('test.csv', sep=',')
>>> df.columns
Index([u'a ', u'a .1', u'a .2'], dtype='object')
again, note the white space between the char and the num.
A work around previously was to set mangle_dupe_cols
to False
, manually strip the columns and then clean up the column names. With pandas 0.19.0
, mangle_dupe_cols=False
raises a ValueError
.
This issue is more of a question as to what the spec of mangle_dupe_cols
is and what a full implementation, mentioned in https://github.com/pydata/pandas/issues/13262 , desires/is going to be.
Expected Output
Index([u'a', u'a.1', u'a.2'], dtype='object')
Output of pd.show_versions()
Comment From: jorisvandenbossche
@rahulporuri Those whitespaces are not specific to the duplicate column names:
In [6]: s = """a ,b ,c
...: 1,2,3
...: 4,5,6
...: 7,8,9"""
In [8]: pd.read_csv(StringIO(s)).columns
Out[8]: Index(['a ', 'b ', 'c '], dtype='object')
But of course, it is true that in this case you can simply strip them from whitespace, and with the mangled columns not.
However, note that previously the read values were just plain wrong with duplicate column names. Therefore the mangle_dupe_columns=False
feature is disabled for now.
Comment From: jorisvandenbossche
The idea of mangle_dupe_cols=False
would be to read the values correctly, but without the .1
and .2
, .. added to the column names (and once this works, you can then strip the whitespace).
If somebody implements it, we would certainly accept a PR to make this work again. But the issue to discuss that is https://github.com/pydata/pandas/issues/13262
Comment From: jreback
closing as this is covered by the referenced issues.