Pandas ENH: Upgrade Pandas SQLAlchemy code to be compatible with 2.0+ syntax

Is your feature request related to a problem?

As of SQLAlchemy 1.4, there is a new syntax available that will become the only syntax in version 2.0+. Pandas uses SQLAlchemy for performing SQL IO, particularly in DataFrame.to_sql and pd.read_sql (via pd.read_sql_query and pd.read_sql_table). This usage is not compatible with the new SQLAlchemy syntax.

Describe the solution you'd like

Upgrade Pandas SQLAlchemy code to be compatible with the new 1.4+/2.0+ syntax. An official guide for how to do so in a test-driven fashion is available here. Note that at least the following 2.0+ incompatibility warnings are issued (though this list is by no means exhaustive):

.../pandas/io/sql.py:1313: RemovedIn20Warning: The MetaData.bind argument is deprecated and will be removed in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
meta = MetaData(self.connectable, schema=schema)

.../pandas/io/sql.py:1103: RemovedIn20Warning: The MetaData.bind argument is deprecated and will be removed in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
meta = MetaData(self.pd_sql, schema=schema)

.../sqlalchemy/sql/schema.py:939: RemovedIn20Warning: The ``bind`` argument for schema methods that invoke SQL against an engine or connection will be required in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche
.me/e/b8d9)
bind = _bind_or_error(self)

.../pandas/io/sql.py:1612: RemovedIn20Warning: The Connection.connect() method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
with self.connectable.connect() as conn:

API breaking implications

The upgrade to the 2.0+ syntax is not backwards compatible, and even the upgrade to 1.4 may require some changes (though it is primarily an incremental step from 1.3). Since the range of SQLA features used by Pandas is limited, there should be little difficulty in supporting both >=2.0 (and hence also ~=1.4) and <=1.3.

Describe alternatives you've considered

The alternative is to pin sqlalchemy<=2.0, or to stop using the Pandas SQL IO functions.

Additional context

The guide linked above shows how to discover SQLA compatibility issues, and then how to tweak the test suite to ensure that no issues are later introduced. Note that many of these upgrades are simple fixes (using connection.execute instead of engine.execute, not binding an Engine into a MetaData instance, not calling Connection.connect, etc.) and that most are backwards-compatible with recent versions of SQLA.

Comment From: CaselIT

There's also this warning being raised when using a connection that's already in a transaction:

lib\site-packages\pandas\io\sql.py:1168: RemovedIn20Warning: Calling .begin() when a transaction is already begun, creating a 'sub' transaction, is deprecated and will be removed in 2.0.  See the documentation section 'Migrating from the nesting pattern' for background on how to migrate from this pattern. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
    with self.connectable.begin() as tx:

Comment From: fangchenli

After #43116, there is one final piece that remains. We need to refactor our SQL execution mechanism to match sqlalchemy 1.4/2.0.

Currently, we convert query and params to DBAPI2.0 format and directly pass it to the connectable(engine or connection). But sqlalchemy 1.4/2.0 offers two paths for query execution. The execute method only takes text() object or other sqlachemy constructs. All string queries need to go through the exec_driver_sql method. This makes the query bypass sqlachemy abstraction and directly get executed on the underlining driver.

My thought on this is that users should know how to work with sqlalchemy and are responsible for making sure all inputs are compliant with sqlachemy syntax. We simply pass user inputs to the corresponded sqlachemy function.

Comment From: ryan-williams

Just noting that SQLAlchemy 2.0.0 was released a couple days ago, and pip install pandas sqlalchemy now results in broken pd.read_sql_table (possibly among other issues), cf. https://github.com/pandas-dev/pandas/issues/51015#issuecomment-1407462259.

Pinning sqlalchemy<2 seems like a good workaround until this is resolved. Thanks to everyone working on it here.

Comment From: Edimo05

https://github.com/pandas-dev/pandas/issues/51086

Cannot connect with to_sql, apparently I cannot send the connection as an engine

Comment From: phofl

Pinning this for now to get more visibility

Comment From: vahtras

Being asked to post here: my report in #51105 was a duplicate. Seems the Engine class has changed a bit in sqlalchemy2 (no execute attribute). In the docstring of to_sql we have an example that fails

>>> df.to_sql('users', con=engine)
>>> engine.execute("SELECT * FROM users").fetchall()
...
    AttributeError: 'Engine' object has no attribute 'execute'

so a workaround for this case could be

>>> with engine.connect() as conn:
...    conn.execute(text("SELECT * FROM users")).fetchall()

Comment From: jbrockmendel

Does this still need to be pinned?

Comment From: phofl

I'd keep it till 2.0 is out so that we don't get new issues as long as we are not compatible.

Could also try unpinning it and re-pin when people open new issues?

Comment From: bernardo-becker

To properly use pd.read_sql having sqlachemy 2.0 consider:

Use sqlachemy.text() to format your query string
If you are passing variables to your queries (by means of pd.read_sql params) check this link to properly format the query string and params: https://peps.python.org/pep-0249/

your code should look like this:

import pandas as pd from sqlalchemy import text

query_string = "select * from table" con = ... # specify your con params = ... # tupple, list or dict of variables df = pd.read_sql(text(query_string), con, params)

I hope this helps!

Comment From: phofl

I am unpinning for now since 2.0 came out today which brings compatibility with SQLAlchemy 2.0