Code at issue
with open(outputfile, "w") as file_handle:
df2.to_csv(file_handle, compression='gzip')
Problem description
The written file is not gzip compressed.
Expected Output
The output should be a gzip compressed csv file. Similar to what is obtained when using:
df2.to_csv('/path/to/file.csv.gz',compression='gzip')
Comment From: WillAyd
Maybe related to #21144
Comment From: minggli
>>> import os
>>> import pandas as pd
>>> from pandas import *
>>>
>>> pd.__version__
'0.20.3'
>>>
>>> df = DataFrame(100 * [[123, 234, 435]])
>>>
>>> with open('test_compressed', 'w') as fh:
... df.to_csv(fh, compression='gzip')
...
>>> fh_size = os.path.getsize('test_compressed')
>>> df.to_csv('test_compressed', compression='gzip')
>>> f_size = os.path.getsize('test_compressed')
>>>
>>> os.remove('test_compressed')
>>> assert fh_size == f_size
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
looks like an existing behaviour dating back to version 0.20 or earlier.
Actually, the documentation of compression=
says:
compression : string, optional
A string representing the compression to use in the output file. Allowed values are ‘gzip’, ‘bz2’, ‘zip’, ‘xz’. This input is only used when the first argument is a filename.
so it is not supported but may be a new use case perhaps?
Comment From: toninlg
Hi,
Should the following code works with this merge or is it related to #22555?
import os
import sys
import pandas as pd
from pandas import *
print(sys.version)
print(pd.__version__)
df = DataFrame(100 * [[123, 234, 435]])
with open('./test_compressed.gz', 'w', newline='') as fh:
df.to_csv(fh)
fh_size = os.path.getsize('./test_compressed.gz')
df.to_csv('./test_compressed.gz')
f_size = os.path.getsize('./test_compressed.gz')
os.remove('./test_compressed.gz')
assert fh_size == f_size
3.6.7 (default, Dec 6 2019, 07:03:06) [MSC v.1900 64 bit (AMD64)] 0.25.1
3.7.7 (default, May 6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)] 1.0.5 I have an assertion error in both cases and if I add compression='gzip' to the first to_csv, I have RuntimeWarning: compression has no effect when passing file-like object as input.
Thank you
Comment From: AvivAvital2
@toninlg this worked for me
import gzip
with io.StringIO() as buf:
df.to_csv(buf)
with open('test_compressed.gz', 'wb') as remote_file:
remote_file.write(gzip.compress(bytes(buf.getvalue(), 'utf-8')))
Comment From: miodeqqq
You can also try this:
import csv
import gzip
from io import BytesIO, TextIOWrapper
gz_buffer = BytesIO()
with gzip.GzipFile(fileobj=gz_buffer, mode="w") as gz_file:
df.to_csv(
path_or_buf=TextIOWrapper(gz_file, "utf8"),
index=False,
sep=",",
quoting=csv.QUOTE_NONE,
compression="gzip",
)