In #47215 I brought up the issue of allowing sets as arguments for the DataFrame
constructor. This was addressed in #47231 . But there are a few more cases that should be fixed:
>>> pd.Series([3,4,5], index=set(["a", "b", "c"]))
c 3
a 4
b 5
dtype: int64
>>> pd.DataFrame.from_records([[1,2,3]], columns=set(["a", "b", "c"]), index=set(["x", "y", "z"]))
c a b
x 1 2 3
y 1 2 3
z 1 2 3
We shouldn't allow a set
as an argument for index
or columns
in both these cases.
Comment From: lithomas1
Do we need a deprecation for this, or is it OK to do as a breaking change like the previous PR?
Comment From: Dr-Irv
Do we need a deprecation for this, or is it OK to do as a breaking change like the previous PR?
I think a breaking change is fine.
Comment From: Dr-Irv
Also do we want to disallow views on dicts?
>>> data =[1, 2, 3]
>>> data
[1, 2, 3]
>>> d = {'a'+str(value): value for value in data}
>>> d
{'a1': 1, 'a2': 2, 'a3': 3}
>>> pd.Series(d.values(), index=d.keys())
a1 1
a2 2
a3 3
dtype: int64
The above works, but strictly speaking, d.keys()
and d.values()
are not array-like, so maybe we should also test if instances of MappingView
are passed and reject them?
If we agree those shouldn't be allowed, that might require a deprecation cycle.
Comment From: Dr-Irv
Also, for Series()
, the docs say that we accept an Iterable
, but we don't accept all Iterable
as values, e.g. sets. So we should adjust the docs as well.
Comment From: Dr-Irv
Here's another one. I think that sets shouldn't be allowed as the argument when constructing an Index
:
>>> pd.Index(set([1,2]))
Index([1, 2], dtype='int64')
>>> pd.Index(set([2,1]))
Index([1, 2], dtype='int64')
The order of the Index
is ambiguous when you pass a set
argument.
Comment From: Dr-Irv
Similar question about using a dict
as argument to Index
. Should this work:
pd.Index({"a":1, "b":2})
If yes, we should update the docs. If not, then we should check for it. Same goes for any of the subclasses of Index
Comment From: Dr-Irv
Agreed in dev meeting on 2/12 that we don't want to accept set
and be explicit about what we take.
Related comment: https://github.com/pandas-dev/pandas/issues/54176#issuecomment-1640525810