Feature Type

  • [ ] Adding new functionality to pandas

  • [X] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

import pandas as pd

df = pd.util.testing.makeMixedDataFrame()

print(f'{list(df)=}')
>>> list(df)=['A', 'B', 'C', 'D']

df[['foo']]
>>> KeyError: "['foo'] not in index"

Feature Description

The error message would be more helpful if it listed available columns:

df[['foo']]
>>> KeyError: "['foo'] not in columns=['A', 'B', 'C', 'D']"

Alternative Solutions

n/a

Additional Context

No response

Comment From: rhshadrach

pandas DataFrames can have a massive number of columns which would either (a) overload the stdout or (b) we would need to truncate the output. Even when there are few colums, we'd need to worry about the repr of the individual columns being long themselves.

Comment From: janosh

Could check if

if len(', '.join(df)) < some_treshold:
    raise KeyError(f"['foo'] not in columns={', '.join(df)}")

and make some_treshold configurable via, say, pd.options.key_errors.max_col_list_len.

Comment From: mroeschke

Yeah as is I would be -0.5 to include this given @rhshadrach concerns

  1. In an interactive environment, one can quickly access df.columns or df.index after the error
  2. In a script/process, I guess it could be useful in the traceback with error logging infrastructure but may be too verbose more often than not

Comment From: janosh

but may be too verbose more often than not

More often than not column count and name lengths should be manageable, no?

In a script/process, I guess it could be useful in the traceback with error logging infrastructure

Exactly, that's my use case! When a job fails and I only see it several hours later in a workflow with a dozen different dataframes, it can be hard to determine which data access is failing and how to fix it. I usually have to rerun the script interactively and print column names to determine the fix.

Comment From: kostyafarber

Hey I'd like to work on this. Do we want to go ahead with making these changes?

Or are we not fully sold on this idea yet.

Comment From: phofl

This needs more discussion first.

I am also leaning more towards no. We don't want to have a million options