Code Sample, a copy-pastable example if possible
compare = pd.MultiIndex.from_product([df_agr['TEXT1'],df_mat['TEXT1']]).to_series()
Problem description
I'm creating an series of multi index to compare and get the fuzzy score
ValueError Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\pandas\indexes\multi.py in from_product(cls, iterables, sortorder, names) 934 categoricals = [Categorical.from_array(it, ordered=True) 935 for it in iterables] --> 936 labels = cartesian_product([c.codes for c in categoricals]) 937 938 return MultiIndex(levels=[c.categories for c in categoricals],
C:\Anaconda3\lib\site-packages\pandas\tools\util.py in cartesian_product(X) 37 return [np.tile(np.repeat(np.asarray(com._values_from_object(x)), b[i]), 38 np.product(a[i])) ---> 39 for i, x in enumerate(X)] 40 41
C:\Anaconda3\lib\site-packages\pandas\tools\util.py in
C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in repeat(a, repeats, axis) 395 except AttributeError: 396 return _wrapit(a, 'repeat', repeats, axis) --> 397 return repeat(repeats, axis) 398 399
ValueError: negative dimensions are not allowed
Expected Output
Agreement Material ratio token_sort_ratio partial_ratio token_set_ratio
0 OP-001BR1(4) MA Offshore Day Rate TIE,CABLE,NYLON,780 X 8.9 MM,50/PACK 14 15 41 41 1 OP-001BR1(4) MA Offshore Day Rate CHAIN TAIL 76mm 12 20 21 21 2 OP-001BR1(4) MA Offshore Day Rate 5-LINK ADAPTOR NRV4-84mm 21 21 36 36 3 OP-001BR1(4) MA Offshore Day Rate 5-LINK ADAPTOR NRV4-76mm 21 21 32 32 4 OP-001BR1(4) MA Offshore Day Rate ANCHOR-12 TONS STEVPRIS MK5 23 22 37 37
Output of pd.show_versions()
Comment From: TomAugspurger
Could you simplify your example to make it reproducible?
Comment From: sp3234
For every row in df_agr['TEXT1'] comapre with all the rows in df_mat['TEXT1'] column if you have seen in an expected out OP-001BR1(4) MA Offshore Day Rate is compared with all the rows in df_mat['TEXT1'] column ratio tokensortratio partial_ratio token_set_ratio are calculated columns
It is working fine if shape is same for two dataframes
Simple ratio =========== fuzz.ratio("this is a test", "this is a test!") 97 Partial Ratio =========== fuzz.partial_ratio("this is a test", "this is a test!") 100 Token Sort Ratio =========== fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 100 Token Set Ratio =========== fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 100
Comment From: jreback
@sp3234 need a copy-pastable example that one can simply run. pls follow the instructions and populate pd.show_versions()
as well.
Comment From: jreback
please comment if you show a copy-pastable example