Can it be possible to improve drop_duplicates() method to handle list column datatpyes? Now if I have a column with list elements then drop_duplicates() throws the following exception:

TypeError: type object argument after * must be a sequence, not map

Example:

df = pd.DataFrame({'a':[10,10,10,11,12], 'b':[['a', 'b'], ['a', 'b'], ['x', 'y'], ['a', 'b'], ['x', 'y']]})
df.drop_duplicates()

Now I have to create a new column that contains the original list elements as strings and I have to use this new column at drop_duplicates().

Thank you!

Comment From: jreback

lists are not hashable (nor is list a real datatype in pandas anyhow).

but you can use tuples

In [7]: df['c'] = df['b'].apply(tuple)

In [8]: df
Out[8]: 
    a       b       c
0  10  [a, b]  (a, b)
1  10  [a, b]  (a, b)
2  10  [x, y]  (x, y)
3  11  [a, b]  (a, b)
4  12  [x, y]  (x, y)

In [9]: df.drop_duplicates(subset=['c'])
Out[9]: 
    a       b       c
0  10  [a, b]  (a, b)
2  10  [x, y]  (x, y)

In [12]: df.drop_duplicates(subset=['a', 'c'])
Out[12]: 
    a       b       c
0  10  [a, b]  (a, b)
2  10  [x, y]  (x, y)
3  11  [a, b]  (a, b)
4  12  [x, y]  (x, y)