Location of the documentation
https://dev.pandas.io/docs/reference/api/pandas.DataFrame.merge.html
Documentation problem
There is no documentation on this case
Suggested fix for documentation
I'm a bit confused about the behavior here, because I don't know what to expect. When I perform the merge as shown below, I would expect that both operations return the same output.
left = pd.DataFrame({'a': [1, 2], 'b': [1, 1], "l": [22, 23]}).set_index(['a', 'b'])
right = pd.DataFrame({'b': [1], "r": [12]}).set_index(['b'])
print(pd.merge(left, right, left_on=['b'], right_index=True, how="left"))
print(pd.merge(left, right, left_on=['b'], right_on=["b"], how="left"))
But the Index of both results differs. The first merge has the same index as the left DataFrame
while the second merge has only b
as index. As a result, the index from the second merge is no longer unique.
First Merge:
l r
a b
1 1 22 12
2 1 23 12
Second Merge:
l r
b
1 22 12
1 23 12
If this is the desired behavior, we should adjust the documentation to show this. If it is not, I would file an BUG issue. We should adjust the documentation nevertheless. I would add an example with an index join to show the right behavior.
Comment From: TomAugspurger
This is probably buggy, but I'm not entirely sure either. I think they should have the same index (I think the second one).
Comment From: bjornasm
Is there any progress here? Would be nice with a clarification if all merging should happen by having the dataframes have the same index or not. I don't see why it isn't straight forward to merge by choosing any columns in the dataframes, as you would expect its the content of the columns that have to match?