Pandas BUG/ENH : Column name mangling doesn't strip white space

A small, complete example of the issue

a ,a ,a 
1,2,3
4,5,6
7,8,9

note the white space next to the char.

>>> import pandas
>>> df = pandas.read_table('test.csv', sep=',')
>>> df.columns
Index([u'a ', u'a .1', u'a .2'], dtype='object')

again, note the white space between the char and the num.

A work around previously was to set mangle_dupe_cols to False, manually strip the columns and then clean up the column names. With pandas 0.19.0, mangle_dupe_cols=False raises a ValueError.

This issue is more of a question as to what the spec of mangle_dupe_cols is and what a full implementation, mentioned in https://github.com/pydata/pandas/issues/13262 , desires/is going to be.

Expected Output

Index([u'a', u'a.1', u'a.2'], dtype='object')

Output of `pd.show_versions()`

## INSTALLED VERSIONS commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: None.None pandas: 0.19.0 nose: 1.3.7 pip: 8.1.2 setuptools: 23.1.0 Cython: 0.24 numpy: 1.10.4 scipy: None statsmodels: None xarray: None IPython: 5.1.0 sphinx: 1.4.1 patsy: None dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: 2.4.0 xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: 3.6.0 bs4: 4.4.1 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None

Comment From: jorisvandenbossche

@rahulporuri Those whitespaces are not specific to the duplicate column names:

In [6]: s = """a ,b ,c 
    ...: 1,2,3
    ...: 4,5,6
    ...: 7,8,9"""

In [8]: pd.read_csv(StringIO(s)).columns
Out[8]: Index(['a ', 'b ', 'c '], dtype='object')

But of course, it is true that in this case you can simply strip them from whitespace, and with the mangled columns not. However, note that previously the read values were just plain wrong with duplicate column names. Therefore the mangle_dupe_columns=False feature is disabled for now.

Comment From: jorisvandenbossche

The idea of mangle_dupe_cols=False would be to read the values correctly, but without the .1 and .2, .. added to the column names (and once this works, you can then strip the whitespace). If somebody implements it, we would certainly accept a PR to make this work again. But the issue to discuss that is https://github.com/pydata/pandas/issues/13262

Comment From: jreback

closing as this is covered by the referenced issues.

Pandas BUG/ENH : Column name mangling doesn't strip white space

A small, complete example of the issue

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`