While looking into some typing issues, I found that the following code works:
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=set(["a", "b", "c"]))
print(df)
The result:
b a c
0 1 2 3
1 4 5 6
We document that the columns
and index
arguments should be Index
or "array-like". Certainly a set
is not array-like!
Internally, we type those arguments as Axes
, which is defined as Collection[Any]
, but in the pandas-stubs, we are more restrictive, not allowing sets to be passed.
I think we should disallow set
as a type in the constructor, since it doesn't follow the documentation, and if you were to use it, the behavior is unclear. We could do one or more of the following:
- Add code in the
DataFrame
constructor to raise an error ifindex
orcolumns
are sets - Change the type declaration for
Axes
to be more restrictive, i.e., not useCollection
I think we should definitely do (2), but I'm not sure if (1) would be considered a change in the API, since we never documented that sets are acceptable, but they did work (and possibly give funky behavior).
Comment From: jreback
no sets are unordered but definition
we shouldn't allow this (it's possible we convert these though)
Comment From: rhshadrach
I don't think we should view (1) as an API change, assuming there is no indication in the documentation that sets are allowed. The API is what is documented as being supported, not whether or not it works by chance.
+1 on not allowing sets.
Comment From: simonjayhawkins
I don't think we should view (1) as an API change, assuming there is no indication in the documentation that sets are allowed. The API is what is documented as being supported, not whether or not it works by chance.
+1 on not allowing sets.
we are not that clear on our type definitions for the parameters section of the docstrings
the only mention of array-like in https://pandas.pydata.org/pandas-docs/stable/development/contributing_docstring.html#pandas-docstring-guide
If the exact type is not relevant, but must be compatible with a NumPy array, array-like can be specified.
Now, if a list
is accepted here, then a list
does not have say a ndim
attribute, so compatible
is therefore vague.
If compatible means that can be converted to a numpy array then np.array(set(["a", "b", "c"]))
gives a 0-dim array which is not indexable like np.array(list(["a", "b", "c"]))
So for all intents and purposes, set
is not array-like
However, I think we still sometimes deprecate things that happen to work by chance first to err on the side of caution.
For the stubs, disallowing sets is fine and the right thing to do.
+1 for deprecating not allowing sets.
Comment From: 5j9
I belive this change is why I'm getting type checking warnings for using a tuple
or dict.keys()
as columns=
in DataFrame
constructor. Is this intentionally so, or should I open an issue for it?
Comment From: Dr-Irv
I belive this change is why I'm getting type checking warnings for using a
tuple
ordict.keys()
ascolumns=
inDataFrame
constructor. Is this intentionally so, or should I open an issue for it?
Are you using pandas-stubs
? Those should work, so create an issue at https://github.com/pandas-dev/pandas-stubs/issues/