xref #13361 - [x] support union w Series/CategoricalIndex as well as Categorical #14199 - [x] add ignore_order to ignore the raising on an ordered Categorical (and just have it work) #15219 - [ ] do we want to put this in the pd namespace (or change its name). Consider Categorical.from_union(...)

Comment From: jreback

cc @janschulz cc @chris-b1 @jorisvandenbossche @shoyer

Comment From: jreback

I think the location is fine. This mostly is part of a developer/extender API, e.g. used internally by other parts of pandas and other packages (e.g. dask), rather than in an of itself useful to a regular user.

Comment From: jankatins

+1 for adding a Categorical.from_union(*cats, ignore_order=False) instead of pd.xxx() -> IMO it shouldn't be exposed as top level API and from_union() is a nice equivalent to from_codes().

Comment From: jreback

@chris-b1 this was partially closed by #14191 ?

Comment From: chris-b1

It was #14199, but yes - I edited the top comment.

Comment From: js3711

@jreback @janschulz I am interested in starting to contribute to pandas and see this as a good first PR opportunity. Do you guys agree?

  • If so, what do you see as the desired behavior for "add ignore_order to ignore the raising on an ordered Categorical (and just have it work)"
  • I do like the idea of Categorical.from_union(...). Should pandas.types.concat.union_categoricals still be supported (with the implementation living in from_union)?

Comment From: chris-b1

Setup

In [15]: c1 = pd.Categorical(['a', 'a', 'b'], categories=['b', 'a', 'c'], ordered=True)

In [16]: c2 = pd.Categorical(['b', 'b', 'a'])

In [17]: union_categoricals([c1, c2])
TypeError: Categorical.ordered must be the same

For your first question - the idea would be to allow this

In [18]: union_categoricals([c1, c2], ignore_order=True)
[a, a, b, b, b, a]
Categories (3, object): [b, a, c]

On your second question - not sure if there's complete agreement on the API, but assuming there is a Categorical.from_union I would suggest leaving the implementation where it is, and calling the union_categoricals function inside Categorical.from_union

Comment From: jorisvandenbossche

the union_categoricals function is itself mentioned in the docs (http://pandas.pydata.org/pandas-docs/stable/categorical.html#unioning), so to start I think it is good to just improve this function (with eg what @chris-b1 showed above)

Comment From: js3711

Thank you all for the comments. I have made an attempt at a pull request to support the ignore_order argument. #15219

I will hold off on from_union until there is agreement on the API change.

Comment From: jreback

so to close this issue, I think we need to add Categorical.from_union as a short-cut (last item on the list).

Comment From: jbrockmendel

I haven't seen a huge demand for this in the 6 years since the last comment, so lean against adding this to the API.

Comment From: mroeschke

Agreed, closing but we can reopen if there interest again