Pandas wide_to_long ignores columns which start with something else than a digit

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({"A X1970" : {0 : "a", 1 : "b", 2 : "c"},
                    "A X1980" : {0 : "d", 1 : "e", 2 : "f"},
                    "B X1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
                    "B X1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
                    "X"     : dict(zip(range(3), np.random.randn(3)))
                   })
df["id"] = df.index
pd.wide_to_long(df, ["A", "B"], i="id", j="xyear", sep=' ')

Problem description

wide_to_long gets confused when the stub is not a string representation of a number. If you delete X's from the example above everything will work fine. With the X's however it stops working and the returned dataframe has zero rows.

Expected Output

dataframe with more than zero rows

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-64-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.20.3 pytest: None pip: 9.0.1 setuptools: 35.0.2 Cython: None numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 6.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 sqlalchemy: 1.1.11 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: 0.1.2 pandas_gbq: None pandas_datareader: None

Comment From: TomAugspurger

Using sep=' X' seems to work.

In [7]: pd.wide_to_long(df, ["A", "B"], i="id", j="xyear", sep=' X')
Out[7]:
                 X  A    B
id xyear
0  1970   1.433113  a  2.5
1  1970  -0.808899  b  1.2
2  1970   0.188952  c  0.7
0  1980   1.433113  d  3.2
1  1980  -0.808899  e  1.3
2  1980   0.188952  f  0.1

You're welcome to dig into wide_to_long to determine whether it should work on your example with sep=' '. I suspect it should, but I'm not sure.

Comment From: tdpetrou

It seems like you might not know about regular expressions. The suffix parameter contains the regular expression \d+ which looks for one or more digits. If you call it like the following, it will work (but keep the X. Use Tom's solution to remove the X.

>>> pd.wide_to_long(df, ["A", "B"], i="id", j="xyear", sep=' ', suffix='X\d+')
                 X  A    B
id xyear                  
0  X1970  0.397588  a  2.5
1  X1970 -0.563002  b  1.2
2  X1970 -0.260190  c  0.7
0  X1980  0.397588  d  3.2
1  X1980 -0.563002  e  1.3
2  X1980 -0.260190  f  0.1

Comment From: jreback

closing as user issue

Pandas wide_to_long ignores columns which start with something else than a digit

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`