Code Sample, a copy-pastable example if possible
df = pd.DataFrame({"A X1970" : {0 : "a", 1 : "b", 2 : "c"},
"A X1980" : {0 : "d", 1 : "e", 2 : "f"},
"B X1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
"B X1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
"X" : dict(zip(range(3), np.random.randn(3)))
})
df["id"] = df.index
pd.wide_to_long(df, ["A", "B"], i="id", j="xyear", sep=' ')
Problem description
wide_to_long gets confused when the stub is not a string representation of a number. If you delete X's from the example above everything will work fine. With the X's however it stops working and the returned dataframe has zero rows.
Expected Output
dataframe with more than zero rows
Output of pd.show_versions()
Comment From: TomAugspurger
Using sep=' X'
seems to work.
In [7]: pd.wide_to_long(df, ["A", "B"], i="id", j="xyear", sep=' X')
Out[7]:
X A B
id xyear
0 1970 1.433113 a 2.5
1 1970 -0.808899 b 1.2
2 1970 0.188952 c 0.7
0 1980 1.433113 d 3.2
1 1980 -0.808899 e 1.3
2 1980 0.188952 f 0.1
You're welcome to dig into wide_to_long
to determine whether it should work on your example with sep=' '
. I suspect it should, but I'm not sure.
Comment From: tdpetrou
It seems like you might not know about regular expressions. The suffix
parameter contains the regular expression \d+
which looks for one or more digits. If you call it like the following, it will work (but keep the X. Use Tom's solution to remove the X.
>>> pd.wide_to_long(df, ["A", "B"], i="id", j="xyear", sep=' ', suffix='X\d+')
X A B
id xyear
0 X1970 0.397588 a 2.5
1 X1970 -0.563002 b 1.2
2 X1970 -0.260190 c 0.7
0 X1980 0.397588 d 3.2
1 X1980 -0.563002 e 1.3
2 X1980 -0.260190 f 0.1
Comment From: jreback
closing as user issue