Pandas pandas read_sql reads the entire table in to memory despite specifying chunksize

I was trying to process a massive table in chunks and therefore wanted to read the table in chunks and process it.

When i tried reading the table using the pandas.read_sql_table i ran out of memory even though i had passed in the chunksize parameter.

I'm using the mysqlclient for python3.

Code Sample, a copy-pastable example if possible

eng = sqlalchemy.create_engine("mysql+mysqldb://user:pass@localhost/db_name") dframe = pandas.read_sql_table('table_name', eng, chunksize=100)

What i expected, was for the function to return an iterator that lazily loads the data into memory. The documentation is not very clear about this nor have I found anything else on google.

Any further information on this will be appreciated.

Comment From: jorisvandenbossche

This is a known issue, and a limitation of most python database drivers (not something pandas can solve), but should be better documented (see #10693).

Similar issue: #12265 (with some additional explanation).

PRs to improve the docs are always welcome!

Comment From: jeetjitsu

Thank you for the prompt reply. Would be happy to add to the documentation. But to take this a little further, is not possible to add this functionality pandas side with the limit offset trick? Especially when a sqlalchemy connection has been passed in

Comment From: jreback

@jorisvandenbossche close?

Comment From: jorisvandenbossche

Yes, it's a doc issue, but that is already covered by https://github.com/pydata/pandas/issues/10693