Currently the keyword use_nullable_dtypes(only implemented for read_parquet now) will use nullable dtypes for columns even when that column does not have nulls.

Should this behavior change?

xref discussion https://github.com/pandas-dev/pandas/pull/40687#issuecomment-833447019 https://github.com/pandas-dev/pandas/issues/42588#issuecomment-894981214 https://github.com/pandas-dev/pandas/issues/42588#issuecomment-882773411

(Note that always using nullable dtypes always(the current behavior) is a choice made on our end https://github.com/pandas-dev/pandas/blob/master/pandas/io/parquet.py#L215-L227 not by the pyarrow engine)

cc @pandas-dev/pandas-core

Comment From: bashtage

I would vote that it should change to only use when necessary, and to use standard dtypes when there are no nulls. Users can always upcast to nullable should they want to use nulls.

Comment From: phofl

I think we can close this. We implemented this for other functions like for read_parquet, which makes the most sense imo

Comment From: MarcoGorelli

Agreed - personally I'm not keen on value-dependent behaviour such as

only use when necessary, and to use standard dtypes when there are no nulls

so I'd vote for keeping as-is (use_nullable_dtypes uses nullable types - which is clear and simple)

Comment From: lithomas1

We'll have to tell fastparquet then. I think they still do it the other way.

Comment From: phofl

Could you open an issue there?