Problem description

This is an enhancement request to allow handling and saving comment strings with DataFrame text file IO. Two related stackoverflow questions about such feature: http://stackoverflow.com/questions/39724298/pandas-extract-comment-lines http://stackoverflow.com/questions/29233496/write-comments-in-csv-file-with-pandas

Code Sample, a copy-pastable example if possible

df = pd.read_csv("mydata.csv", comment="#")
df.comment
"this is a comment from mydata.csv file"
df.to_csv("output.csv", comment="#") # saves the `comment` string by pasting "#" before its each line and putting it before the table. 

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.1.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.0 nose: 1.3.7 pip: 9.0.1 setuptools: 29.0.1 Cython: 0.24 numpy: 1.12.0 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.0.0 sphinx: 1.4.5 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.7 blosc: None bottleneck: None tables: 3.2.2 numexpr: 2.6.0 matplotlib: 1.5.3 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.4.1 html5lib: None httplib2: 0.9.2 apiclient: 1.5.1 sqlalchemy: 1.0.14 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.42.0 pandas_datareader: None

Comment From: jreback

xref https://github.com/pandas-dev/pandas/issues/5686

attaching meta-data (that they is mostly-not propogated) is not generally a good idea, rather this could be something like:

df, comments, bad_lines = pd.read_csv(...., return_comments=True, return_bad_lines=True)

or some such.

of if we simply want to not allow both comment processing and returning (which is prob preferable)

df, comments, bad_lines = pd.read_csv(...., comments='return', errors='return') (going with the new errors kw that has been proposed for handling / warning on bad_lines)

or probably better

df, return_data = ...

where

return_data = 
{
'comments' = [list_of_tuples (line_number, comment)], 
'errors'= [list_of_tuples(line_number, text, error)]
}

or somesuch (this would be nicer as you only always have 1 return value, and it has optional keys)

Comment From: TrigonaMinima

@jreback I'd like to work upon this.

Is there any other target file other than pandas/io/parsers.py?

Comment From: jreback

the main file is pandas/parser.pyx, pandas/io/parsers.py houses the top-level interface and the python parser, while parser.pyx houses the c-parser. Both would need to be modified.

Comment From: joeflack4

Not a huge help to me but adding a +1 just because of current need