Would it be possible to add a feature allowing one to specify that a particular column to be downloaded via read_sql has a particular data type? I'm particularly interested in forcing certain dataframe columns to be categorical, to conserve memory usage.

Yes, I could transform the data after receiving it from the database, but the datasets I'm working with are really large, causing huge memory usage high water marks. (An alternative is to break transfer the data in chunks and then transform the data in each chunk, but that complicates the code and, in my test runs, takes a little longer.)

Comment From: jreback

well if sql were to actually provide a dictionary / category encoded column then this would work. but this is generally an implementation detail of the particular backend and I don't believe is exposed to users.

you might want to look at #17790 which is highly performant.

Comment From: jorisvandenbossche

We have had several similar enhancement requests, and I think we can add this (just like read_csv has one as well). See https://github.com/pandas-dev/pandas/issues/13049, https://github.com/pandas-dev/pandas/issues/6798, https://github.com/pandas-dev/pandas/issues/17560 In general this is exactly the same as doing astype after reading, but for the case of using chunksize I think it can be easier to do this inside read_sql to every chunk is consistent.

Comment From: jorisvandenbossche

@sfromm-plm But to be clear, @jreback is correct in that this won't really "work" for categorical type of data, in the sense that it will not give much memory usage gain. As the values are first returned from the database as plain values, and would only afterwards be converted to Categorical.

Comment From: sfromm-plm

@jorisvandenbossche : Right. @jreback's answer wasn't immediately clear to me, but I settled on what you just said.

Thanks for the replies, guys, even if it wasn't what I was hoping to hear.