Code Sample, a copy-pastable example if possible
>>> import boto3
>>> import s3fs
>>> import pandas as pd
>>>
>>> s3 = boto3.client("s3")
>>> response = s3.get_object(Bucket="df-raw-869771", Key="csv/customer-01.csv")
>>>
>>> pd.read_csv(response["Body"])
customer_id|store_id|first_name|last_name|email|address_id|activebool|dw_insert_date|dw_update_date|active
0 9|2|MARGARET|MOORE|MARGARET.MOORE@sakilacustom...
1 13|2|KAREN|JACKSON|KAREN.JACKSON@sakilacustome...
2 17|1|DONNA|THOMPSON|DONNA.THOMPSON@sakilacusto...
3 21|1|MICHELLE|CLARK|MICHELLE.CLARK@sakilacusto...
4 25|1|DEBORAH|WALKER|DEBORAH.WALKER@sakilacusto...
... ...
1188 587|1|SERGIO|STANFIELD|SERGIO.STANFIELD@sakila...
1189 591|1|KENT|ARSENAULT|KENT.ARSENAULT@sakilacust...
1190 595|1|TERRENCE|GUNDERSON|TERRENCE.GUNDERSON@sa...
1191 599|2|AUSTIN|CINTRON|AUSTIN.CINTRON@sakilacust...
1192 4|2|BARBARA|JONES|BARBARA@sakilacustomer.org|8...
[1193 rows x 1 columns]
if with skipfooter
argument added
>>> import boto3
>>> import s3fs
>>> import pandas as pd
>>>
>>> s3 = boto3.client("s3")
>>> response = s3.get_object(Bucket="df-raw-869771", Key="csv/customer-01.csv")
>>>
>>> pd.read_csv(response["Body"], skipfooter=1)
__main__:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
99 117 115 116 111 109 101 114 95 105 100 124 115.1 116.1 111.1 ... 114.28 103.11 124.112 53.6 55.4 124.113 65.51 99.26 116.33 105.31 118.13 101.35 124.114 49.70 51.21
0 45 49 49 45 50 48 49 57 124 124 49 10 53 55 124 ... 69 83 84 124 69 68 78 65 46 87 69 83 84 64 115
1 97 107 105 108 97 99 117 115 116 111 109 101 114 46 111 ... 76 68 73 78 69 46 80 69 82 75 73 78 83 64 115
2 97 107 105 108 97 99 117 115 116 111 109 101 114 46 111 ... 73 76 76 73 65 77 83 79 78 64 115 97 107 105 108
3 97 99 117 115 116 111 109 101 114 46 111 114 103 124 50 ... 114 103 124 50 55 48 124 65 99 116 105 118 101 124 49
4 51 45 49 49 45 50 48 49 57 124 124 49 10 50 54 ... 115 97 107 105 108 97 99 117 115 116 111 109 101 114 46
.. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
84 99 116 105 118 101 124 49 51 45 49 49 45 50 48 49 ... 51 45 49 49 45 50 48 49 57 124 124 49 10 51 56
85 51 124 49 124 77 65 82 84 73 78 124 66 65 76 69 ... 51 53 124 50 124 82 73 67 75 89 124 83 72 69 76
86 66 89 124 82 73 67 75 89 46 83 72 69 76 66 89 ... 88 84 69 82 124 72 69 67 84 79 82 46 80 79 73
87 78 68 69 88 84 69 82 64 115 97 107 105 108 97 99 ... 103 124 53 52 53 124 65 99 116 105 118 101 124 49 51
88 45 49 49 45 50 48 49 57 124 124 49 10 53 52 51 ... 116 111 109 101 114 46 111 114 103 124 53 57 55 124 65
[89 rows x 1024 columns]
Problem description
I am encountering a problem while using pandas read_csv(). Data is being read in from s3 as StreamingBody object and noticed it worked only when c engine parser is used. (e.g. skipfooter only works with python engine parser).
When a S3 URL is given to read_csv(), there aren't any issues at all
Expected Output
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Darwin
OS-release : 19.2.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8
pandas : 0.25.3
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 41.4.0
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : 0.4.0
scipy : 1.3.1
sqlalchemy : 1.3.9
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.1
Comment From: mroeschke
Thanks for the report, unfortunately I can't access this data so I cannot reproduce. Closing but happy to reopen if you can post something more minimal