Splitting discussion off from #51280 PR #52153

The checking and propagation of flags in __finalize__ means a small-but-everywhere performance hit for all users that we should deprecate.

Flags only has allow_duplicate_labels, which can be disallowed by a 3rd-party validation library.

Comment From: mroeschke

Looks like pandera provides a utility for checking for duplicate labels: https://pandera.readthedocs.io/en/stable/reference/generated/pandera.api.pandas.components.Column.html#pandera.api.pandas.components.Column

Comment From: jorisvandenbossche

For context, the .flags / set_flags was a new feature added in pandas 1.2, as a general mechanism but at the time specifically for the "optionally disallow duplicate labels" option ( (https://pandas.pydata.org/docs/whatsnew/v1.2.0.html#optionally-disallow-duplicate-labels). See https://github.com/pandas-dev/pandas/issues/27108 / https://github.com/pandas-dev/pandas/pull/28394 (cc @TomAugspurger)

Comment From: TomAugspurger

The checking and propagation of flags in finalize means a small-but-everywhere performance hit for all users that we should deprecate.

Is that specific to the flags mechanism, or is it something to do with calling __finalize__ in the first place? I'd be fine with a dedicated boolean to propagate the duplicate labels information.

Comment From: jbrockmendel

Is [the performance penalty of flags] specific to the flags mechanism, or is it something to do with calling finalize in the first place?

I think of it as being __finalize__ holistically.