read_sql_table
in pandas is much nicer than using raw SQLAlchemy API and likely has better performance.
But when reading large tables, chunksize
is unavoidable.
Can I suggest an option to read with skip_chunks
, like skip_rows
in read_csv
?
Right now I'm doing next(chunk), but really do not need all the chunks.
Comment From: jorisvandenbossche
@denfromufa I am not sure if this would really be helpful or better compared to doing next(chunk)
manually, or selecting the rows you want in the sql query.
read_sql
uses fetchmany
, so also to skip a chunk, you would need to fetch the data of that chunk (unless there is a way to tell that to fetchmany
, but I am not aware of that). One advantage would be that the chunk is not converted from python lists to a DataFrame, but I am not sure this is worth adding a new keyword.
Comment From: den-run-ai
There are offset and fetch keywords in SQL
Comment From: jorisvandenbossche
@denfromufa As far as I know, those constructs are not supported by all SQL flavours, and for pandas we should check how sqlalchemy supports this in their core (not orm).
Comment From: mroeschke
Thanks for the suggestion but given the lack of community or core dev interest looks like this feature won't be moving forward to closing