It seems currently there is no option similar to numpy's setflags to make pandas dataframe completely immutable (writeable=false). We are considering a design where we use immutability to know that we can cache objects. While we can still design a system like that, it would be great if we could enforce immutability to catch any errors.

Comment From: jreback

This is not in-scope for pandas 1.x, it could be done in a sub-class.

Virtually all operations return new objects. Simply don't use inplace flags, nor do in-place indexing and you have de-facto immutability.

Comment From: jreback

you can also use pandas.util.hash_pandas_object to make data hashes as well.

Comment From: mitar

Oh. :-( As a subclass it is pretty tricky, because you have to make sure you shadow over Pandas methods which can potentially change internals. Hashing is also not enough. It can tell you that something changed, but not prevent changing.

Comment From: mitar

Is there a way to tell pandas to create a pandas object using a subclass?

Comment From: jreback

http://pandas-docs.github.io/pandas-docs-travis/internals.html

Comment From: fkromer

@mitar @jreback @TomAugspurger I know of this third party package addressing this issue: static-frame. Do you know other packages as well?

Comment From: TomAugspurger

Nope.

On Fri, Apr 24, 2020 at 7:55 AM Florian Kromer notifications@github.com wrote:

@mitar https://github.com/mitar @jreback https://github.com/jreback @TomAugspurger https://github.com/TomAugspurger I know of this third party package addressing this issue: static-frame https://github.com/InvestmentSystems/static-frame. Do you know other packages as well?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/16567#issuecomment-618991559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITVIQ7OYO54HHQAYR3ROGD5NANCNFSM4DNSVC5A .

Comment From: fkromer

Is there is also no tooling support (e.g. pylint extension) to ensure people don't introduce bugs into functionality which takes a dataframe as input, manipulates the dataframe and output the manipulated dataframes by mistake?

Comment From: fkromer

For functions a workaround could be to implement a decorator for dynamic analysis: It would have to check which args and kwargs are dataframes or series. For e.g. a single dataframe it would have to calculate the hash of the input dataframe df_in before and after the wrapped function is called (df_in_hash_before = pd.util.hash_pandas_object(df_in, index=True), df_in_hash_after = pd.util.hash_pandas_object(df_in, index=True)) and assert if the hashes differ (pd.testing.assert_series_equal(input_df_hash, output_df_hash)).

Comment From: mitar

Yes, I gave up on this. I find it really sad because Pandas is almost there. Many methods have in_place argument and it would be great if you could just enforce this to be False and prevent any other modifications. Getting a copy every time by design (when enabled).

Comment From: fkromer

To get this into pandas would be too optimistic I guess. I'm thinking about to implement the decorator and publishing it in a package pytest-pandas. This would allow to add dynamic anaysis of "mutability conformity" during tests.