Hi there,

Is there a way to load the data frame in pandas and not wait for full load to complete before being able to continue execute code?

example:

df = pandas.create('my_big_csv');

I want to immediately get returned to the console, right now that takes a while.

Comment From: shoyer

No, I'm afraid this isn't possible with pandas's data model, which does not do lazy evaluation.

You might try dask-dataframe for something similar: http://dask.pydata.org/en/latest/dataframe-overview.html

Comment From: ghost

Would this be faster if I use python pandas as a RedShift UDF?

Comment From: jreback

closing as out-of-scope, dask link is the best bet.

Comment From: jreback

http://pandas.pydata.org/pandas-docs/stable/ecosystem.html#out-of-core are also useful links.

Comment From: shoyer

No idea on RedShift, I haven't used it.