Suppose I have two DataFrames:

left = pd.DataFrame({'key': list("abcd"), 'value': [1, 2, 3, 4]})
right = pd.DataFrame({'key': list("aceb"), 'value': [5, 6, 7, 8]})

Columns with with the same name are given suffixes in a merge():

In [8]: pd.merge(left, right, on='key')
Out[8]:
  key  value_x  value_y
0   a        1        5
1   b        2        8
2   c        3        6

I would like to invoke a function instead:

In [9]: pd.merge(left, right, on='key', combine=lambda x, y: x + y)
Out[9]:
  key  value
0   a      6
1   b     10
2   c      9

In a left/right/outer join, missing keys imply no function call (just take the single value):

In [10]: pd.merge(left, right, on='key', how='outer', combine=lambda x, y: x + y)
Out[10]:
  key  value
0   a      6
1   b     10
2   c      9
3   d      4
4   e      7

Only one of combine or suffixes parameter can be specified. Of course, the user can simply request an override by taking only the right's values if both keys are present:

In [11]: pd.merge(left, right, on='key', how='left', combine=lambda x, y: y)
Out[11]:
  key  value
0   a      5
1   b      8
2   c      6
3   d      4

Comment From: TomAugspurger

How would this work with multiple shared columns?, e.g.

python left = pd.DataFrame({'key': list("abcd"), 'value': [1, 2, 3, 4], 'value2': list('aaaa')}) right = pd.DataFrame({'key': list("aceb"), 'value': [5, 6, 7, 8], 'value2': list('aaaa')})

I could see two options:

  1. The x, y passed are either 1-d arrays or Series (what's the index? key?), and combine can be a single function called on each pair, or a dictionary -> function API like {'value': lambda x, y: x + y, 'value2': lambda x, y: ...}
  2. The x and y passed is a DataFrame of all the cols from left and right that have overlap

Comment From: chrisaycock

@TomAugspurger I like the first option. If the user just gives a single function, then that function is applied to all columns; a dict means applying the specified function to the given column.

Comment From: mroeschke

Thanks for the request but it seems there hasn't been much development or interest from the community or core dev team in a while so closing for now. Can reopen when there's more interest