Pandas DEPR: pandas.io.sql.execute - Nineya|java/go/python

There is currently a bug in pandas.io.sql.execute on the main branch (not in any released versions). I created the bug in #49531 while working on #48576. In #49531, I changed SQLDatabase to only accept a Connection and not an Engine, because sqlalchemy 2.0 restricts the methods that are available to Engine. #49531 created bugs in both read_sql and pandas.io.sql.execute, and I have an open PR to fix read_sql in #49967.

This reproduces the bug in pandas.io.sql.execute:

from pathlib import Path
from sqlalchemy import create_engine
from pandas import DataFrame
from pandas.io import sql

db_file = Path(r"C:\Temp\test.db")
db_file.unlink(missing_ok=True)
db_uri = "sqlite:///" + str(db_file)
engine = create_engine(db_uri)
DataFrame({'a': [2, 4], 'b': [3, 6]}).to_sql('test_table', engine)
sql.execute('select * from test_table', engine).fetchall()

Traceback when running on the main branch in pandas:

Error closing cursor
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 991, in fetchall
    rows = dbapi_cursor.fetchall()
sqlite3.ProgrammingError: Cannot operate on a closed database.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 1995, in _safe_close_cursor
    cursor.close()
sqlite3.ProgrammingError: Cannot operate on a closed database.
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 991, in fetchall
    rows = dbapi_cursor.fetchall()
sqlite3.ProgrammingError: Cannot operate on a closed database.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Github\pandas\execute_error.py", line 11, in <module>
    sql.execute('select * from test_table', engine).fetchall()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\result.py", line 1072, in fetchall
    return self._allrows()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\result.py", line 401, in _allrows
    rows = self._fetchall_impl()
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 1819, in _fetchall_impl
    return self.cursor_strategy.fetchall(self, self.cursor)
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 995, in fetchall
    self.handle_exception(result, dbapi_cursor, e)
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 955, in handle_exception
    result.connection._handle_dbapi_exception(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise_
    raise exception
  File "C:\Program Files\Python310\lib\site-packages\sqlalchemy\engine\cursor.py", line 991, in fetchall
    rows = dbapi_cursor.fetchall()
sqlalchemy.exc.ProgrammingError: (sqlite3.ProgrammingError) Cannot operate on a closed database.
(Background on this error at: https://sqlalche.me/e/14/f405)

When running on pandas 1.5.2, there is no error.

After #49967, SQLDatabase accepts str, Engine, and Connection, but in SQLDatabase.__init__, a Connection is created if it was not passed in, and an ExitStack is created to clean up the Connection and Engine if necessary. Cleanup happens either in SQLDatabase.__exit__ or at the end of the generator which is returned by read_sql if chunksize is not None. What I'm not able to see is how to do the cleanup after the caller fetches the results of pandas.io.sql.execute. I have come up with two options so far: * Restrict pandas.io.sql.execute to require a Connection, or else do not allow queries that return rows if a str or Engine is used. * Use Result.freeze to fetch the results before closing the SQLDatabase context manager. The problem here is that it requires sqlalchemy 1.4.45, which was only released on 12/10/2022, due to a bug in prior versions: https://github.com/sqlalchemy/sqlalchemy/issues/8963

Comment From: phofl

So pandas.io.sql.execute is not public. How does this impact the public API?

When putting your query into read_sql it works as expected

Comment From: cdcadman

No impacts outside of pandas.io.sql.execute. This method is mentioned above this line, so I should at least update the documentation: https://pandas.pydata.org/docs/user_guide/io.html?highlight=sql%20execute#engine-connection-examples. I'm not worried about the behavior of pandas.io.sql.execute changing if it's ok with you.

Comment From: phofl

Thx for pointing this out. Was not aware of that. Do we have public docstrings for this function? We should raise a proper error message at least. Do you know if there is an advantage compared to calling engine.execute?

@fangchenli could you weight in here? Any reason that this is public?

Comment From: cdcadman

I don't think there are any public docstrings. I could change the PR to raise an error if the caller passes an Engine or str and the result returns rows. Engine.execute will be removed in sqlalchemy 2.0: https://docs.sqlalchemy.org/en/14/core/connections.html#sqlalchemy.engine.Engine.execute.

Comment From: fangchenli

This method was introduced in the initial commit of the SQL module in 2010 as a helper function. I don't think it's still relevant today. Removing it completely would be an even better option.

Comment From: phofl

Thx, then let’s raise an error if anything other than a connection is passed and let’s deprecate it to get rid of it in 3.0

Comment From: luke396

take