I use pandas.read_csv() to read an URL of an uploaded CSV file.

However, sometimes it caused an issue that read_csv() returns an empty DataFrame with NaN.

I think it caused by the mime type of the CSV application/vnd.ms-excel, because all CSVs with mimetype text/csv doesn't have any issue.

Is that a known issue?

I found this post that seems like the same issue.

Is this read_csv()'s issue? Or something else?

The traceback:

Traceback (most recent call last):
  File "/env/local/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/env/local/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/env/local/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/env/local/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/env/local/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/vmagent/app/main.py", line 144, in visualize_file
    df = process(df)
  File "/home/vmagent/app/main.py", line 122, in process
    result = foo(df)
  File "/home/vmagent/app/main.py", line 85, in foo
    len_ts = int(np.array(data)[:, 0].max())
ValueError: cannot convert float NaN to integer

And the code:

@app.route('/upload', methods=['GET', 'POST'])
def upload_file():
    # Upload file to Google Cloud Storage
    ......

import numpy as np
def foo(data):    
    len_ts = int(np.array(data)[:, 0].max())
    ......
    return df

def process(df):
    df = np.array(df)
    ......
    df = foo(df)
    return bar(df)

@app.route('/graph/<filename>')
def visualize_file(filename):
    import pandas as pd
    df = pd.read_csv('https://storage.cloud.google.com/project-name/' + filename + '.csv'))
    df = process(df)
    df = pd.DataFrame(df, columns=['col1', 'data1', 'data2'])
    chart_data = df.to_dict(orient='records')
    chart_data = json.dumps(chart_data, indent=2)
    data = {'chart_data': chart_data}
    return render_template("graph.html", data=data)

Comment From: jreback

you would have to show a reproducible example

Comment From: hangyao

@jreback I can't completely reproduce the error. Sometimes there was no error for the CSV with application/vnd.ms-excel. I tried to rename the CSV with the error by adding or removing _ or - in the filenames and re-upload the CSV file. Sometimes it just works that way. I don't know if the mimetype causes the issue.

Comment From: gfyoung

@hangyao : Unfortunately, without a reproducible example (i.e. we need to be able to take whatever code you provide and run it ourselves with no changes), we can't help you out too much here. The SO post is also not particularly useful for us to diagnose any issue. I am inclined to close this if that's okay.