Feature Type
- [X] Changing existing functionality in pandas
Problem Description
-
fsspec.open()setsauto_mkdir=Trueby default as one can see here and here. -
As a result, a pandas user with lax AWS permissions will maybe inadvertently create a new s3 bucket when running e.g.
df.to_csv('s3://non-existing-bucket/out.csv'). -
This is at odds with the behavior when writing locally to disk, since e.g.
df.to_csv('non/existing/dir/out.csv')will raiseOSError: Cannot save file into a non-existent directory: 'non/existing/dir' -
the maintainer of
fsspecdoes seem to think the current default is not ideal, and there has been some interest in switching the default toFalse.
Feature Description
change the behavior so that if auto_mkdir is not passed explicitly in storage_options, it is given to fsspec.open as False.
See possible implementation in https://github.com/pandas-dev/pandas/pull/50136
Comment From: phofl
Similar to fsspec concerns, this is an API change on our side as well, so would have to deprecate first. Not sure, if I would want to do this in general, I would prefer if we can rely on fsspec defaults.
Comment From: MarcoGorelli
I would prefer if we can rely on fsspec defaults
yeah same
Comment From: spolloni
isn't pandas already specifically NOT relying on fsspec (s3fs) defaults here? https://github.com/pandas-dev/pandas/blob/5ee4dace4334f98147abcb42b5218e6d8f45f7f8/pandas/io/common.py#L415-L423
Comment From: MarcoGorelli
That's much smaller though, and it only uses anon if fsspec.open fails
I'd suggest bringing this up with ffspec
Closing for now then, but thanks for the suggestion!