Hi there,
Is there a way to load the data frame in pandas and not wait for full load to complete before being able to continue execute code?
example:
df = pandas.create('my_big_csv');
I want to immediately get returned to the console, right now that takes a while.
Comment From: shoyer
No, I'm afraid this isn't possible with pandas's data model, which does not do lazy evaluation.
You might try dask-dataframe for something similar: http://dask.pydata.org/en/latest/dataframe-overview.html
Comment From: ghost
Would this be faster if I use python pandas as a RedShift UDF?
Comment From: jreback
closing as out-of-scope, dask link is the best bet.
Comment From: jreback
http://pandas.pydata.org/pandas-docs/stable/ecosystem.html#out-of-core are also useful links.
Comment From: shoyer
No idea on RedShift, I haven't used it.