Feature Type
- [X] Changing existing functionality in pandas
Problem Description
-
fsspec.open()
setsauto_mkdir=True
by default as one can see here and here. -
As a result, a pandas user with lax AWS permissions will maybe inadvertently create a new s3 bucket when running e.g.
df.to_csv('s3://non-existing-bucket/out.csv')
. -
This is at odds with the behavior when writing locally to disk, since e.g.
df.to_csv('non/existing/dir/out.csv')
will raiseOSError: Cannot save file into a non-existent directory: 'non/existing/dir'
-
the maintainer of
fsspec
does seem to think the current default is not ideal, and there has been some interest in switching the default toFalse
.
Feature Description
change the behavior so that if auto_mkdir
is not passed explicitly in storage_options
, it is given to fsspec.open
as False
.
See possible implementation in https://github.com/pandas-dev/pandas/pull/50136
Comment From: phofl
Similar to fsspec concerns, this is an API change on our side as well, so would have to deprecate first. Not sure, if I would want to do this in general, I would prefer if we can rely on fsspec defaults.
Comment From: MarcoGorelli
I would prefer if we can rely on fsspec defaults
yeah same
Comment From: spolloni
isn't pandas already specifically NOT relying on fsspec (s3fs) defaults here? https://github.com/pandas-dev/pandas/blob/5ee4dace4334f98147abcb42b5218e6d8f45f7f8/pandas/io/common.py#L415-L423
Comment From: MarcoGorelli
That's much smaller though, and it only uses anon
if fsspec.open
fails
I'd suggest bringing this up with ffspec
Closing for now then, but thanks for the suggestion!