Problem description
This is an enhancement request to allow handling and saving comment strings with DataFrame text file IO. Two related stackoverflow questions about such feature: http://stackoverflow.com/questions/39724298/pandas-extract-comment-lines http://stackoverflow.com/questions/29233496/write-comments-in-csv-file-with-pandas
Code Sample, a copy-pastable example if possible
df = pd.read_csv("mydata.csv", comment="#")
df.comment
"this is a comment from mydata.csv file"
df.to_csv("output.csv", comment="#") # saves the `comment` string by pasting "#" before its each line and putting it before the table.
Output of pd.show_versions()
Comment From: jreback
xref https://github.com/pandas-dev/pandas/issues/5686
attaching meta-data (that they is mostly-not propogated) is not generally a good idea, rather this could be something like:
df, comments, bad_lines = pd.read_csv(...., return_comments=True, return_bad_lines=True)
or some such.
of if we simply want to not allow both comment processing and returning (which is prob preferable)
df, comments, bad_lines = pd.read_csv(...., comments='return', errors='return')
(going with the new errors
kw that has been proposed for handling / warning on bad_lines)
or probably better
df, return_data = ...
where
return_data =
{
'comments' = [list_of_tuples (line_number, comment)],
'errors'= [list_of_tuples(line_number, text, error)]
}
or somesuch (this would be nicer as you only always have 1 return value, and it has optional keys)
Comment From: TrigonaMinima
@jreback I'd like to work upon this.
Is there any other target file other than pandas/io/parsers.py
?
Comment From: jreback
the main file is pandas/parser.pyx
, pandas/io/parsers.py
houses the top-level interface and the python parser, while parser.pyx
houses the c-parser. Both would need to be modified.
Comment From: joeflack4
Not a huge help to me but adding a +1 just because of current need