Running this code generates an error, since columns are now considered an Index and not a list. I'm not sure if the error that gets raised (set_index ends up asking for an index that is the entire length of the data frame) is actually by design. In this case, it was a bit unintuitive to figure out how to catch, since Index objects and python lists often work so similarly.
import string
import pandas as pd
data1 = 'x'*5 + 'y'*5
data2 = string.lowercase[:5]*2
data3 = range(10)
data_dict = {'Cat': list(data1), 'SubCat': list(data2), 'Vals':data3}
df = pd.DataFrame(data_dict)
ordered_df = df[['Cat', 'SubCat', 'Vals']]
correct_df = ordered_df.reset_index([x for x in ordered_df.columns[:2]])
error_df = ordered_df.set_index(ordered_df.columns[:2])
Comment From: jreback
So you want this?
In [23]: ordered_df.set_index(list(ordered_df.columns[:2]))
Out[23]:
Vals
Cat SubCat
x a 0
b 1
c 2
d 3
e 4
y a 5
b 6
c 7
d 8
e 9
yeh, its a pretty trivial change. want to take a go at it. Appears innocuous.
Comment From: jreback
change here to not is_list_like(...)
Comment From: michaelbilow
Yep, exactly.
Comment From: StephenKappel
It's valid to pass an Index or row labels to set_index
with the expected behavior of that index becoming the index. So, to achieve the desired behavior, I am opening a PR that only treats an index like a list of column labels when the index is not the same length as the DataFrame. This should avoid breaking the existing behavior while making ordered_df.set_index(ordered_df.columns[:2])
possible in most cases.
Comment From: hundredrab
Hi, can I take a go at this?
Comment From: szabi
I came across what is essentially the same issue in 2023: I wanted to pass an instance of typing.Collection
(specifically a tuple) fully expect it to work, going by the documentation:
keys : label or array-like or list of labels/arrays This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, "array" encompasses :class:
Series
, :class:Index
,np.ndarray
, and instances of :class:~collections.abc.Iterator
.
The check in https://github.com/pandas-dev/pandas/blob/01693d64fe7b4335c327d62c5a5861f07c98f8c9/pandas/core/frame.py#L5787-L5788 is too restrictive.
It seems to be a simple one-liner. However, in 2015 not is_list_like(...)
but that does not seem to be part of common.py
any more. Would the correct course of action in 2023 be to use if not isinstance(keys, Collection)
from typing
? Or ist is_list_like
the way to go (import from where?)
Pinging @jreback