Pandas version checks
- [X] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
Documentation problem
The to_csv
mode method does not explicitly list the options for users. It links to the Python open()
method that contains some options that aren't relevant.
Suggested fix for documentation
It'd be nice to list the mode options right in the to_csv
docs. This should be significantly more user-friendly. Suggested text:
mode: str, default ‘w’
The file write mode which can be `w`, `x`, or `a`. `w` will write the file and overwrite another file if it already exists. `x` will write the file, but error out if a file with the same name already exists. `a` will append to the existing file.
The available write modes are the same as [open()](https://docs.python.org/3/library/functions.html#open).
Comment From: MrPowers
@datapythonista - Thanks for the help on this one!
Comment From: datapythonista
Thanks for reporting.
When only a subset of the values is expected, instead of using the type str
, we can directly use mode : {'w', 'x', 'a'}, default 'w'
.
To explain the types, we can also use bullet points, we do that in some docs, may make things easier to read.
It'd be good to see which other to_*
functions have mode
as a parameter, and update them all. Probably better to start by only one, get a code review, and when we're happy with it, apply it to the rest.
Comment From: twoertwein
One important value is also "wb"
for binary file handles that do not have the .mode
attribute themselves (this is hinted in the documentation for path_or_buf
).
Comment From: HamidrezaSK
take
Comment From: MrPowers
It'd be good to see which other to_* functions have mode as a parameter, and update them all.
@datapythonista - strongly agree with this, especially because the different to_*
APIs have different write modes that make sense. I'm not sure if to_parquet supports mode
because I don't see it in the docs, but appending obviously won't work because Parquet files are immutable.
I don't mean to overcomplicate this too much, but for to_parquet
we should also think about how mode
and partition_cols
interact. "appending" in that case probably means something more like how Spark uses "append" (i.e. adding files to an existing folder).
I mention this because the to_csv
writer should probably also support partition_cols
as well. Just documenting the existing options is a good start, but we should probably do some longer-term planning as well.
Comment From: qudus4l
I agree that it would be helpful to have the available modes for to_csv() explicitly listed in the documentation. Thanks for suggesting this improvement! Just a small note: the suggested text says that mode defaults to 'w', but in fact the default value is 'w' only if path_or_buf is a file path (otherwise, it defaults to None). Maybe it would be good to clarify this in the text. Apart from that, the suggested text looks great to me! It would make it much easier for users to understand what the mode parameter does and what the available options are.
Comment From: datapythonista
From a quick look I can only see to_json
having a mode
parameter. The description of its docstring will have to be extended from a generic one, since mode only makes sense with lines=True
, and we need to say that.