I was trying to process a massive table in chunks and therefore wanted to read the table in chunks and process it.
When i tried reading the table using the pandas.read_sql_table i ran out of memory even though i had passed in the chunksize parameter.
I'm using the mysqlclient for python3.
Code Sample, a copy-pastable example if possible
eng = sqlalchemy.create_engine("mysql+mysqldb://user:pass@localhost/db_name") dframe = pandas.read_sql_table('table_name', eng, chunksize=100)
What i expected, was for the function to return an iterator that lazily loads the data into memory. The documentation is not very clear about this nor have I found anything else on google.
Any further information on this will be appreciated.
Comment From: jorisvandenbossche
This is a known issue, and a limitation of most python database drivers (not something pandas can solve), but should be better documented (see #10693).
Similar issue: #12265 (with some additional explanation).
PRs to improve the docs are always welcome!
Comment From: jeetjitsu
Thank you for the prompt reply. Would be happy to add to the documentation. But to take this a little further, is not possible to add this functionality pandas side with the limit offset trick? Especially when a sqlalchemy connection has been passed in
Comment From: jreback
@jorisvandenbossche close?
Comment From: jorisvandenbossche
Yes, it's a doc issue, but that is already covered by https://github.com/pydata/pandas/issues/10693