Feature Type

  • [X] Changing existing functionality in pandas

Problem Description

  • fsspec.open() sets auto_mkdir=True by default as one can see here and here.

  • As a result, a pandas user with lax AWS permissions will maybe inadvertently create a new s3 bucket when running e.g. df.to_csv('s3://non-existing-bucket/out.csv').

  • This is at odds with the behavior when writing locally to disk, since e.g. df.to_csv('non/existing/dir/out.csv') will raise OSError: Cannot save file into a non-existent directory: 'non/existing/dir'

  • the maintainer of fsspec does seem to think the current default is not ideal, and there has been some interest in switching the default to False.

Feature Description

change the behavior so that if auto_mkdir is not passed explicitly in storage_options, it is given to fsspec.open as False.

See possible implementation in https://github.com/pandas-dev/pandas/pull/50136

Comment From: phofl

Similar to fsspec concerns, this is an API change on our side as well, so would have to deprecate first. Not sure, if I would want to do this in general, I would prefer if we can rely on fsspec defaults.

Comment From: MarcoGorelli

I would prefer if we can rely on fsspec defaults

yeah same

Comment From: spolloni

isn't pandas already specifically NOT relying on fsspec (s3fs) defaults here? https://github.com/pandas-dev/pandas/blob/5ee4dace4334f98147abcb42b5218e6d8f45f7f8/pandas/io/common.py#L415-L423

Comment From: MarcoGorelli

That's much smaller though, and it only uses anon if fsspec.open fails

I'd suggest bringing this up with ffspec

Closing for now then, but thanks for the suggestion!