In #47215 I brought up the issue of allowing sets as arguments for the DataFrame constructor. This was addressed in #47231 . But there are a few more cases that should be fixed:
>>> pd.Series([3,4,5], index=set(["a", "b", "c"]))
c 3
a 4
b 5
dtype: int64
>>> pd.DataFrame.from_records([[1,2,3]], columns=set(["a", "b", "c"]), index=set(["x", "y", "z"]))
c a b
x 1 2 3
y 1 2 3
z 1 2 3
We shouldn't allow a set as an argument for index or columns in both these cases.
Comment From: lithomas1
Do we need a deprecation for this, or is it OK to do as a breaking change like the previous PR?
Comment From: Dr-Irv
Do we need a deprecation for this, or is it OK to do as a breaking change like the previous PR?
I think a breaking change is fine.
Comment From: Dr-Irv
Also do we want to disallow views on dicts?
>>> data =[1, 2, 3]
>>> data
[1, 2, 3]
>>> d = {'a'+str(value): value for value in data}
>>> d
{'a1': 1, 'a2': 2, 'a3': 3}
>>> pd.Series(d.values(), index=d.keys())
a1 1
a2 2
a3 3
dtype: int64
The above works, but strictly speaking, d.keys() and d.values() are not array-like, so maybe we should also test if instances of MappingView are passed and reject them?
If we agree those shouldn't be allowed, that might require a deprecation cycle.
Comment From: Dr-Irv
Also, for Series(), the docs say that we accept an Iterable, but we don't accept all Iterable as values, e.g. sets. So we should adjust the docs as well.
Comment From: Dr-Irv
Here's another one. I think that sets shouldn't be allowed as the argument when constructing an Index:
>>> pd.Index(set([1,2]))
Index([1, 2], dtype='int64')
>>> pd.Index(set([2,1]))
Index([1, 2], dtype='int64')
The order of the Index is ambiguous when you pass a set argument.
Comment From: Dr-Irv
Similar question about using a dict as argument to Index. Should this work:
pd.Index({"a":1, "b":2})
If yes, we should update the docs. If not, then we should check for it. Same goes for any of the subclasses of Index
Comment From: Dr-Irv
Agreed in dev meeting on 2/12 that we don't want to accept set and be explicit about what we take.
Related comment: https://github.com/pandas-dev/pandas/issues/54176#issuecomment-1640525810