Feature Type
-
[ ] Adding new functionality to pandas
-
[X] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
import pandas as pd
df = pd.util.testing.makeMixedDataFrame()
print(f'{list(df)=}')
>>> list(df)=['A', 'B', 'C', 'D']
df[['foo']]
>>> KeyError: "['foo'] not in index"
Feature Description
The error message would be more helpful if it listed available columns:
df[['foo']]
>>> KeyError: "['foo'] not in columns=['A', 'B', 'C', 'D']"
Alternative Solutions
n/a
Additional Context
No response
Comment From: rhshadrach
pandas DataFrames can have a massive number of columns which would either (a) overload the stdout or (b) we would need to truncate the output. Even when there are few colums, we'd need to worry about the repr of the individual columns being long themselves.
Comment From: janosh
Could check if
if len(', '.join(df)) < some_treshold:
raise KeyError(f"['foo'] not in columns={', '.join(df)}")
and make some_treshold
configurable via, say, pd.options.key_errors.max_col_list_len
.
Comment From: mroeschke
Yeah as is I would be -0.5 to include this given @rhshadrach concerns
- In an interactive environment, one can quickly access
df.columns
ordf.index
after the error - In a script/process, I guess it could be useful in the traceback with error logging infrastructure but may be too verbose more often than not
Comment From: janosh
but may be too verbose more often than not
More often than not column count and name lengths should be manageable, no?
In a script/process, I guess it could be useful in the traceback with error logging infrastructure
Exactly, that's my use case! When a job fails and I only see it several hours later in a workflow with a dozen different dataframes, it can be hard to determine which data access is failing and how to fix it. I usually have to rerun the script interactively and print column names to determine the fix.
Comment From: kostyafarber
Hey I'd like to work on this. Do we want to go ahead with making these changes?
Or are we not fully sold on this idea yet.
Comment From: phofl
This needs more discussion first.
I am also leaning more towards no. We don't want to have a million options