Pandas pandas read_csv() doesn't work with StreamingBody object where python engine is required

Code Sample, a copy-pastable example if possible

>>> import boto3
>>> import s3fs
>>> import pandas as pd
>>> 
>>> s3 = boto3.client("s3")
>>> response = s3.get_object(Bucket="df-raw-869771", Key="csv/customer-01.csv")
>>> 
>>> pd.read_csv(response["Body"])
       customer_id|store_id|first_name|last_name|email|address_id|activebool|dw_insert_date|dw_update_date|active
0     9|2|MARGARET|MOORE|MARGARET.MOORE@sakilacustom...                                                        
1     13|2|KAREN|JACKSON|KAREN.JACKSON@sakilacustome...                                                        
2     17|1|DONNA|THOMPSON|DONNA.THOMPSON@sakilacusto...                                                        
3     21|1|MICHELLE|CLARK|MICHELLE.CLARK@sakilacusto...                                                        
4     25|1|DEBORAH|WALKER|DEBORAH.WALKER@sakilacusto...                                                        
...                                                 ...                                                        
1188  587|1|SERGIO|STANFIELD|SERGIO.STANFIELD@sakila...                                                        
1189  591|1|KENT|ARSENAULT|KENT.ARSENAULT@sakilacust...                                                        
1190  595|1|TERRENCE|GUNDERSON|TERRENCE.GUNDERSON@sa...                                                        
1191  599|2|AUSTIN|CINTRON|AUSTIN.CINTRON@sakilacust...                                                        
1192  4|2|BARBARA|JONES|BARBARA@sakilacustomer.org|8...                                                        

[1193 rows x 1 columns]

if with skipfooter argument added

>>> import boto3
>>> import s3fs
>>> import pandas as pd
>>> 
>>> s3 = boto3.client("s3")
>>> response = s3.get_object(Bucket="df-raw-869771", Key="csv/customer-01.csv")
>>> 
>>> pd.read_csv(response["Body"], skipfooter=1)
__main__:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
    99  117  115  116  111  109  101  114   95  105  100  124  115.1  116.1  111.1  ...  114.28  103.11  124.112  53.6  55.4  124.113  65.51  99.26  116.33  105.31  118.13  101.35  124.114  49.70  51.21
0   45   49   49   45   50   48   49   57  124  124   49   10     53     55    124  ...      69      83       84   124    69       68     78     65      46      87      69      83       84     64    115
1   97  107  105  108   97   99  117  115  116  111  109  101    114     46    111  ...      76      68       73    78    69       46     80     69      82      75      73      78       83     64    115
2   97  107  105  108   97   99  117  115  116  111  109  101    114     46    111  ...      73      76       76    73    65       77     83     79      78      64     115      97      107    105    108
3   97   99  117  115  116  111  109  101  114   46  111  114    103    124     50  ...     114     103      124    50    55       48    124     65      99     116     105     118      101    124     49
4   51   45   49   49   45   50   48   49   57  124  124   49     10     50     54  ...     115      97      107   105   108       97     99    117     115     116     111     109      101    114     46
..  ..  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...    ...    ...    ...  ...     ...     ...      ...   ...   ...      ...    ...    ...     ...     ...     ...     ...      ...    ...    ...
84  99  116  105  118  101  124   49   51   45   49   49   45     50     48     49  ...      51      45       49    49    45       50     48     49      57     124     124      49       10     51     56
85  51  124   49  124   77   65   82   84   73   78  124   66     65     76     69  ...      51      53      124    50   124       82     73     67      75      89     124      83       72     69     76
86  66   89  124   82   73   67   75   89   46   83   72   69     76     66     89  ...      88      84       69    82   124       72     69     67      84      79      82      46       80     79     73
87  78   68   69   88   84   69   82   64  115   97  107  105    108     97     99  ...     103     124       53    52    53      124     65     99     116     105     118     101      124     49     51
88  45   49   49   45   50   48   49   57  124  124   49   10     53     52     51  ...     116     111      109   101   114       46    111    114     103     124      53      57       55    124     65

[89 rows x 1024 columns]

Problem description

I am encountering a problem while using pandas read_csv(). Data is being read in from s3 as StreamingBody object and noticed it worked only when c engine parser is used. (e.g. skipfooter only works with python engine parser).

When a S3 URL is given to read_csv(), there aren't any issues at all

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Darwin OS-release : 19.2.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_AU.UTF-8 LOCALE : en_AU.UTF-8 pandas : 0.25.3 numpy : 1.17.2 pytz : 2019.3 dateutil : 2.8.0 pip : 19.3.1 setuptools : 41.4.0 Cython : 0.29.13 pytest : 5.2.1 hypothesis : None sphinx : 2.2.0 blosc : None feather : None xlsxwriter : 1.2.1 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.8.0 pandas_datareader: None bs4 : 4.8.0 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 3.1.1 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : 0.15.1 pytables : None s3fs : 0.4.0 scipy : 1.3.1 sqlalchemy : 1.3.9 tables : 3.5.2 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.1

Comment From: mroeschke

Thanks for the report, unfortunately I can't access this data so I cannot reproduce. Closing but happy to reopen if you can post something more minimal

Pandas pandas read_csv() doesn't work with StreamingBody object where python engine is required

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`