Problem description

I have comma delimited data I need to output into .csv file.

My data looks like this: 16.0,0.118,15.0,0.5675,0.337,"LTE,eHRPD"

Expected Output

I need this: "LTE,eHRPD" to be output exactly as it is written here, with double quotes and a comma.

I am using QUOTE_NONE, so that I don't get the double quotes around the quoted fields and I get this error: _csv.Error: need to escape, but no escapechar set

I am using Panda version 0.20.1

Comment From: jorisvandenbossche

@agora123 Can you provide a reproducible (runnable) example that shows the error.

I tried with the following, and don't get such error:

In [87]: s = '16.0,0.118,15.0,0.5675,0.337,"LTE,eHRPD"'

In [88]: pd.read_csv(StringIO(s), header=None)
Out[88]: 
      0      1     2       3      4          5
0  16.0  0.118  15.0  0.5675  0.337  LTE,eHRPD

In [89]: pd.read_csv(StringIO(s), header=None, quoting=3)
Out[89]: 
      0      1     2       3      4     5       6
0  16.0  0.118  15.0  0.5675  0.337  "LTE  eHRPD"

So not exactly what you want (but I don't think this is possible), but also not an error.

Comment From: agora123

Hello jorisvandenbossche, thank you very much for your reply.

I have a df DataFrame fetching data from the database, the format of which I can't control. The data has some fields with double quotes and a comma in the field: "LTE,eHRPD" I need to output the data to csv exactly in that format, where I have a double quotes and a comma in that field: "LTE,eHRPD"

Here is a mock up code that I put together:

import pandas import csv

df = pandas.DataFrame({'A': ['1.0627', '0625', '"LTE,eHRPD"']}) print(df)

df.to_csv(quoting=csv.QUOTE_NONE)

Comment From: jorisvandenbossche

Ah, sorry, I assumed it was about read_csv (didn't read that well).

So I think it is not possible to obtain what you want with pandas to_csv. The problem is that your data contains a ,, which also is the delimiter. That is normally not a problem, as the string gets quoted. But, you specify that you don't want to add quotes (with QUOTE_NONE). You do this because the data itself already has quotes, but pandas does not know this. It sees a delimiter in your data, thus wants either quotes (the default, but you overwrite this), or either an escapechar. You can do this last with:

In [21]: df = pd.DataFrame({'A': ['1.0627', '0625', '"LTE,eHRPD"']})

In [22]: print(df.to_csv(quoting=csv.QUOTE_NONE, escapechar='\\'))
,A
0,1.0627
1,0625
2,"LTE\,eHRPD"

but of course, this is also not what you want.

Possible solutions: don't use pandas to_csv but write the data manually (looping over the rows, concatting the values and write line to file), or you could also use an escapechar that is not yet present in the data and remove it afterwards.

Comment From: agora123

Hello jorisvandenbossche, thank you for your reply.

I know I need to set quoting=csv.QUOTE_NONE, and I wish to set escapechar to an empty string (so that I don't have to remove it later), which it will not allow me. Is there a chance the code could be changed to allow me to set escapechar to an empty string?

Also, could you please provide if you can an efficient method for removing an extra character from .csv file in the case that I do have to provide escapechar.

Thank you in advance!

Comment From: jorisvandenbossche

@gfyoung An opinion on whether we could allow an empty escapechar?

(short summary of the case: you have a field that includes a comma, but also has already quotes in the value of that fields. Therefore you set quoting=QUOTE_NONE to prevent adding extra quotes, but then pandas wants to escape the comma. Which is logical, as pandas does not know there are already quotes in the data and so the final result would not result in a invalid csv file. Possible solution is to allow an empty escapechar. But, this is rather a very peculiar cornercase, so not sure if it is worth it to add / whether this is technically easy to allow)

Comment From: gfyoung

@jorisvandenbossche : It wouldn't be too difficult to allow, but IMO the philosophy has generally been that we want to make round-trip possible (we should get the original DataFrame if we read it after sending it to a str with to_csv), and making corner cases like this possible is just asking for trouble in the long-run.

I'm -1 on allowing this because the specifications are conflicting. Given that you have just one location where the escapechar, I would take the output and do:

output = output.replace("\\", "")

Comment From: jorisvandenbossche

@gfyoung Thanks! Sounds like a good assessment, agree with that. So will close this.

@agora123 What you could also do is strip the " from your values, and then writing the csv should then give exactly what you want:

In [29]: print(df.A.str.strip('"').to_csv())
0,1.0627
1,0625
2,"LTE,eHRPD"

Comment From: pieterhartel

If there is a null character in a cell, pandas has trouble writing the data to a csv file. Is this related to the bug above?

>>> df=pandas.DataFrame( [ {'a':1}, {'a':'a\0b'} ] )
>>> df
     a
0    1
1  ab
>>> df.to_csv( 'foo.csv' )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/p/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/generic.py", line 3551, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/home/p/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1180, in to_csv
    csv_formatter.save()
  File "/home/p/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 261, in save
    self._save()
  File "/home/p/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 266, in _save
    self._save_body()
  File "/home/p/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 304, in _save_body
    self._save_chunk(start_i, end_i)
  File "/home/p/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 315, in _save_chunk
    libwriters.write_csv_rows(
  File "pandas/_libs/writers.pyx", line 75, in pandas._libs.writers.write_csv_rows
_csv.Error: need to escape, but no escapechar set

I am using this version of python and pandas on a Linux machine:

$ py --version
Python 3.10.4
$ pip show pandas
Name: pandas
Version: 1.4.2
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: The Pandas Development Team
Author-email: pandas-dev@python.org
License: BSD-3-Clause
Location: /home/p/.pyenv/versions/3.10.4/lib/python3.10/site-packages
Requires: numpy, python-dateutil, pytz
Required-by: