xref #13361
- [x] support union w Series/CategoricalIndex
as well as Categorical
#14199
- [x] add ignore_order
to ignore the raising on an ordered Categorical (and just have it work) #15219
- [ ] do we want to put this in the pd
namespace (or change its name). Consider Categorical.from_union(...)
Comment From: jreback
cc @janschulz cc @chris-b1 @jorisvandenbossche @shoyer
Comment From: jreback
I think the location is fine. This mostly is part of a developer/extender API, e.g. used internally by other parts of pandas and other packages (e.g. dask), rather than in an of itself useful to a regular user.
Comment From: jankatins
+1 for adding a Categorical.from_union(*cats, ignore_order=False)
instead of pd.xxx()
-> IMO it shouldn't be exposed as top level API and from_union()
is a nice equivalent to from_codes()
.
Comment From: jreback
@chris-b1 this was partially closed by #14191 ?
Comment From: chris-b1
It was #14199, but yes - I edited the top comment.
Comment From: js3711
@jreback @janschulz I am interested in starting to contribute to pandas and see this as a good first PR opportunity. Do you guys agree?
- If so, what do you see as the desired behavior for "add ignore_order to ignore the raising on an ordered Categorical (and just have it work)"
- I do like the idea of Categorical.from_union(...). Should pandas.types.concat.union_categoricals still be supported (with the implementation living in from_union)?
Comment From: chris-b1
Setup
In [15]: c1 = pd.Categorical(['a', 'a', 'b'], categories=['b', 'a', 'c'], ordered=True)
In [16]: c2 = pd.Categorical(['b', 'b', 'a'])
In [17]: union_categoricals([c1, c2])
TypeError: Categorical.ordered must be the same
For your first question - the idea would be to allow this
In [18]: union_categoricals([c1, c2], ignore_order=True)
[a, a, b, b, b, a]
Categories (3, object): [b, a, c]
On your second question - not sure if there's complete agreement on the API, but assuming there is a Categorical.from_union
I would suggest leaving the implementation where it is, and calling the union_categoricals
function inside Categorical.from_union
Comment From: jorisvandenbossche
the union_categoricals
function is itself mentioned in the docs (http://pandas.pydata.org/pandas-docs/stable/categorical.html#unioning), so to start I think it is good to just improve this function (with eg what @chris-b1 showed above)
Comment From: js3711
Thank you all for the comments. I have made an attempt at a pull request to support the ignore_order argument. #15219
I will hold off on from_union until there is agreement on the API change.
Comment From: jreback
so to close this issue, I think we need to add Categorical.from_union
as a short-cut (last item on the list).
Comment From: jbrockmendel
I haven't seen a huge demand for this in the 6 years since the last comment, so lean against adding this to the API.
Comment From: mroeschke
Agreed, closing but we can reopen if there interest again