Is your feature request related to a problem?

I wish I could use the Pandas DataFrame.join function to perform left semi joins: selecting each row from the left DataFrame that has at least one matching row in the right DataFrame.

Describe the solution you'd like

I would like to be able to pass leftsemi to the how parameter in the DataDrame.join function to perform this type of join.

Additional context

For example: in SQL this might look like

SELECT *
FROM table1
WHERE EXISTS
(SELECT 1
FROM   table2
WHERE  table1.id = table2.id)

Comment From: jreback

this is just a join after you isin on the rhs - would take a PR for this naive implementation

Comment From: rhshadrach

If there are duplicate matches on the RHS, I think a left semijoin would only produce a single row. But I think this can be produced with a drop duplicates on the left columns.

Comment From: aneesh98

Hi I would like to work on this enhancement, if no one is working on this. I have few questions: 1) Incase of column name mentioned in 'on' param provided, a .isin must be performed on the same 'on' value in rhs df right for the unique values of the 'on' column in lhs?. 2) If no 'on' param value is provided, then the isin should be performed on the index of the rhs? 3) The columns of the rhs dataframe present in the resulting dataframe from .join be dropped? Because the query selects only the rows in lhs df for which there are matching rows in rhs df.

Comment From: debnathshoham

Hi @aneesh98, just wanted to check if you are working on this? Otherwise I would like to pick this up.

Comment From: aneesh98

Hi @debnathshoham, I am actually still working on it.

Comment From: debnathshoham

Great! To answer the questions above: 1. Correct 2. Merge can happen using on, left_on & right_on, right_index & left_index, left_index & right_on and right_index & left_on. If none of these are provided, it tries on the common columns. 3. AFAIU semi join should only returns rows from left that has matching rows in right. It won't add columns from right.

Hope that helps.

Comment From: aneesh98

Hi @debnathshoham. Thank you so much for reply. Can you please confirm if this approach is correct then? 1) If 'on' is None and join is to be done on index-to-index, then find out the rows present in the left dataframe which have matching index values in the right dataframe 2) If 'on' is provided, then perform an inner join based on the values provided in 'on', discard the columns from right dataframe and remove duplicates.

Please let me know if I am correct with this logic, or if I am missing any case or anything at all?. Thank you.

Comment From: debnathshoham

Merge can happen using on, left_on & right_on, right_index & left_index, left_index & right_on and right_index & left_on. If none of these are provided, it tries on the common columns.

Yes, actually you have just considered 2 cases from the 5 I mentioned (you will get a clear picture, if you play around with all these existing merging procedures).

The idea is really simple, in leftsemi, check for the keys in left (can be either index(s) or column(s)) isin the keys in the right (again, could be either index(s) or column(s)). And return those keys from left.

You can check this example from stack. Hope this will clarify.

Comment From: kurzdev

Any updates on this?