Feature Type

  • [ ] Adding new functionality to pandas

  • [X] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I wish error messages for erroneous joins were more helpful when the key that causes problems is not evident. As a pandas user it would help me understand errors faster.

When calling pandas.merge() without the on argument, the join is performed on the intersection of the key in left and right. There can be dozens of such keys. In this situation, if some of the columns have an incompatible type the error message is:

You are trying to merge on object and int64 columns.

This currently does not help finding which of the dozens of columns have incompatible types.

This situation happened a few times when comparing frames with lots of auto-detected columns of which a few columns are mostly None. If one the columns happens to be full of None, its auto-detected type becomes Object, which can't be compared with int64.

Feature Description

Consider the following code:

import pandas as pd
df_left = pd.DataFrame([
  {"a": 1, "b": 1, "c": 1, "d": 1, "z": 1},
])
df_right = pd.DataFrame([
  {"a": 1, "b": 1, "c": 1, "d": None, "z": 1},
])
merge = pd.merge(df_left, df_right, indicator='source')

It fails with the following error:

ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat

A more helpful error message would be:

ValueError: You are trying to merge on int64 and object columns for key "d". If you wish to proceed you should use pd.concat

Alternative Solutions

It is always possible to find the problematic column by comparing column types manually. This requires that the data was persisted. If the data was transient there might be no way to find out the problematic column a posteriori.

Additional Context

No response

Comment From: MarcoGorelli

seems reasonable, do you want to submit a PR?

Comment From: Caerbannog

I am not fluent enough in Python project setup. I spent an hour vainly trying to run the existing tests.

Comment From: MarcoGorelli

no worries, here's the contributing guide if you're interested https://pandas.pydata.org/docs/dev/development/contributing.html

Comment From: Charlie-XIAO

Hi, I'm not sure if Caerbannog is working on it. If he is not, may I claim this issue?

Comment From: MarcoGorelli

sure, go ahead!

Comment From: Charlie-XIAO

take