Slightly related to #28380
Currently an issue in Dask https://github.com/dask/dask/issues/5294 for implementing Named Aggregation (introduced in pandas 0.25.0) is open. To implement this it needs to use _normalize_keyword_aggregation
and _is_multi_agg_with_relabel
.
Making it public would be useful for frameworks like dask mars
cc: @TomAugspurger
Comment From: TomAugspurger
And as an alternative to making it part of the official public API for end-users, we could add a test to ensuring that it's implementation doesn't move, so that projects like dask can rely on it.
That last option conflicts a bit our desire to deprecate all of pandas.core
though.
Comment From: zbrookle
@TomAugspurger Could I work on this? I'm interested in seeing this moved forward since I need the named aggregation feature in Dask to incorporate it into the dataframe_sql framework that I'm building
Comment From: TomAugspurger
That sounds good.
On Sun, May 17, 2020 at 6:19 PM Zach Brookler notifications@github.com wrote:
@TomAugspurger https://github.com/TomAugspurger Could I work on this? I'm interested in seeing this moved forward since I need the named aggregation feature in Dask to incorporate it into the dataframe_sql framework that I'm building
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/28472#issuecomment-629876773, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIS2VOYLST3FEMDIN4DRSBWH3ANCNFSM4IXHWEBQ .
Comment From: jbrockmendel
@rhshadrach these now live in core.apply, so im declaring this your call.
Comment From: rhshadrach
The linked issue is now closed, with dask implementing it using reconstruct_func
(which calls normalize_keyword_aggregation
). With this, I'm okay with considering reconstruct_func
public and adding tests, but I don't think we should adding it to the API.
cc @mroeschke for any thoughts.
Comment From: jbrockmendel
+1 for something pseudo-public
Comment From: mroeschke
+1 as well for exposing this publicly somehow
Comment From: rhshadrach
What would be the reason for moving them?
Comment From: renatocmaciel
The only path that explicitly mentions pseudo-public api is internals.
reconstruct_func
, is_multi_agg_with_relabel
and normalize_keyword_aggregation
are accessible as they do not start with _ or __.
What are your thoughts on this issue? Should they be documented and made available on pandas/core/api.py or we only need to add tests?
Comment From: rhshadrach
This is not widely know, but the API documentation (https://pandas.pydata.org/pandas-docs/dev/reference/index.html) states:
The pandas.core, pandas.compat, and pandas.util top-level modules are PRIVATE. Stable functionality in such modules is not guaranteed.
As such, to ensure we don't accidentally break dask, I think it would be sufficient to add a test for just reconstruct_func
within pandas/tests/apply
with a comment to the effect of "Test is to ensure this method isn't moved; it is used by other libraries (e.g. dask)"