first, I know the strings in column b
can be used directly for sorting. My function aFucntion
seems to be so stupid, because it is just a simplified one to present the problem. In my real situation, it actually produces a list of some English Characters, which then be used to sort words in my native language in a customed way.
Then, I know LIST is not hashable
# coding = utf-8
import pandas as pd
def aFucntion(var):
return list(var)
a = {'a': [1, 2],
'b': ['Python', 'Java']
}
df =pd.DataFrame(a)
print(df)
print('\n')
df['fun'] = df['b'].map(aFucntion)
df1 = df.sort_values(['b']) # this works
print(df1)
print('\n')
df2 = df.sort_values(['fun']) # this works
print(df2)
print('\n')
df3 = df.sort_values(['b', 'fun']) # this yields `TypeError: unhashable type: 'list'` # but why df.sort_values(['fun']) works?
print(df3)
print('\n')
df['fun'] = df['b'].map(lambda e: tuple(aFucntion(e)))
df4 = df.sort_values(['b', 'fun']) # this works
print(df4)
print('\n')
Comment From: TomAugspurger
You can see here that sort_values
treats the case where length(by)
is greater than one differently than when it's equal to one. When you pass a list of columns to sort by, it has to go through our factorization routines, which requires hashable values.
Comment From: michaelwebb7
@TomAugspurger I just came up against this bug – being able to sort by two columns seems a very basic feature that ought to work. Is it possible to fix this? I created a temporary workaround for myself by making a new column that stringified and concatenated the 2 columns I cared about, but it would be great if this just worked out of the box.
Comment From: TomAugspurger
@michaelwebb7 you're able to sort by multiple columns. You aren't able to sort by something that isn't hashable.
Comment From: TomAugspurger
sort_values
needs the values of a column to be hashable when sorting by multiple columns. The original example could be fixed by converting the column of (unhashable) lists to a column of tuples.
In [10]: df['fun2'] = df['fun'].map(tuple)
In [11]: df
Out[11]:
a b fun fun2
0 1 Python [P, y, t, h, o, n] (P, y, t, h, o, n)
1 2 Java [J, a, v, a] (J, a, v, a)
In [12]: df.sort_values(['a', 'fun2'])
Out[12]:
a b fun fun2
0 1 Python [P, y, t, h, o, n] (P, y, t, h, o, n)
1 2 Java [J, a, v, a] (J, a, v, a)
Comment From: m-martin-j
A workaround that helped me is
df2 = df.applymap(tuple) df2.sort_values(['b', 'fun'])
to sort by more than one column