I use the get_dummies function while feature engineering for scikit-learn classifiers. Something I realized is if a value in a dummy column is missing from a df, the dimensions of the reshaped matrix will differ.

df_1 = pd.DataFrame([
    {'size' : 's', 'backorder' : True},
    {'size' : 'm', 'backorder' : True},
    {'size' : 'l', 'backorder' : True},
])

df_2 = pd.DataFrame([
    {'size' : 's', 'backorder' : True},
    {'size' : 's', 'backorder' : True},
    {'size' : 'l', 'backorder' : True},
])

pd.get_dummies(df_1, 'size').shape # returns (3,4)
pd.get_dummies(df_2, 'size').shape # returns (3,3)

Its not elegant, but is it reasonable to add a param for unique values in a dummy column (to preserve shape)?

I'm happy to give this a shot if it sounds like a useful feature.

Comment From: jreback

The typical way to do this would be to turn this into a categorical with all your categories. In facto .get_dummies does exactly this,

In [17]: pd.get_dummies(df_2['size'].astype('category', categories=list('sml')))
Out[17]: 
   s  m  l
0  1  0  0
1  1  0  0
2  0  0  1