After search in Stackoverflow and in your open and closed issues, I can't find any information about the support of SQL Alchemy 2.0 using pd.to_sql()


Question about pandas

An exemple is better than a talk.

from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer
import pandas as pd

# Table creation
engine = create_engine(connection_string, echo=True, future=True) # This uses SQL Alchemy 2.0 API
metadata = MetaData()
category = Table(
        'category',
        metadata,
        Column('id', Integer, primary_key = True),
        Column('name', String(128), nullable = False),
        Column('slug', String(128), nullable = False)
    )

category.create(engine, checkfirst=True)

## Insert using pandas => FAIL
categories = pd.DataFrame([{'id' : 1, 'slug': 'test', 'name': 'Test'}])
categories.to_sql('category', engine, index=False)

This is the error I get from the CLI output :

  File "e:\Projects\test\database.py", line 127, in populate_category_table
    categories.to_sql('category', engine, index=False)
  File "E:\Projects\test\env\lib\site-packages\pandas\core\generic.py", 
line 2779, in to_sql
    sql.to_sql(
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
    pandas_sql.to_sql(
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 1392, in to_sql
    table.create()
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 722, in create
    if self.exists():
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 709, in exists
    return self.pd_sql.has_table(self.name, self.schema)
  File "E:\Projects\test\env\lib\site-packages\pandas\io\sql.py", line 1429, in has_table
    return self.connectable.run_callable(
  File "E:\Projects\test\env\lib\site-packages\sqlalchemy\future\engine.py", line 335, in _not_implemented
    raise NotImplementedError(
NotImplementedError: This method is not implemented for SQLAlchemy 2.0.

For information about the version of every dependency :

$ ./env/Scripts/pip list
Package         Version
--------------- ---------
beautifulsoup4  4.9.3
certifi         2020.12.5
chardet         4.0.0
esprima         4.0.1
greenlet        1.0.0
idna            2.10
mysqlclient     2.0.3
numpy           1.20.1
pandas          1.2.3
patreon         0.5.0
pip             21.0.1
python-dateutil 2.8.1
pytz            2021.1
requests        2.25.1
setuptools      49.2.1
six             1.15.0
soupsieve       2.2
SQLAlchemy      1.4.0
urllib3         1.26.3

Is there a workaround right now or should I avoid using pandas to insert the data into my database ?

Regards,

Comment From: fangchenli

My suggestion is to downgrade to 1.3.23 for now. We're looking into this issue.

Comment From: jorisvandenbossche

@mickaelandrieu there was just a PR merged that deals with some deprecation warnings: https://github.com/pandas-dev/pandas/pull/40471

This might already resolve the compatibility issues with the future sqlalchemy 2.0 as well?

Comment From: mickaelandrieu

Hello @jorisvandenbossche,

When this PR will be released ? I can try and come back here to tell you 👍

Duly noted @fangchenli, thanks for your answer !

Regards

Comment From: scnerd

It's unclear from that MR if it tackled all 2.0 deprecation warnings; there are (at least) two categories of SQLA deprecations, SADeprecationWarning and RemovedIn20Warning, and I'm not sure if that MR only tackled the first.

A complete, official guide for the migration is here. I can confirm that there are numerous SQLA 2.0 warnings thrown in Pandas 1.2.3 (released 2021/03/02), I can install from master and check if they're still there after that MR.

Comment From: mickaelandrieu

Hi @scnerd,

what is the right process to install pandas on master branch ? I'm not really good at Python, but i will help if I can !

Regards,

Comment From: scnerd

@mickaelandrieu pip can accept git repo's like pip install git+https://github.com/pandas-dev/pandas.git

Comment From: scnerd

Having tested some things on master branch, I get at least the following couple of errors:

  /opt/conda/lib/python3.8/site-packages/pandas/io/sql.py:1313: RemovedIn20Warning: The MetaData.bind argument is deprecated and will be removed in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
    meta = MetaData(self.connectable, schema=schema)

  /opt/conda/lib/python3.8/site-packages/pandas/io/sql.py:1103: RemovedIn20Warning: The MetaData.bind argument is deprecated and will be removed in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
    meta = MetaData(self.pd_sql, schema=schema)

  /opt/conda/lib/python3.8/site-packages/sqlalchemy/sql/schema.py:939: RemovedIn20Warning: The ``bind`` argument for schema methods that invoke SQL against an engine or connection will be required in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche
.me/e/b8d9)
    bind = _bind_or_error(self)

  /opt/conda/lib/python3.8/site-packages/pandas/io/sql.py:1612: RemovedIn20Warning: The Connection.connect() method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0. (Background on SQLAlchemy 2.0 at: http://sqlalche.me/e/b8d9)
    with self.connectable.connect() as conn:

Comment From: nsepetys

Is there a workaround right now or should I avoid using pandas to insert the data into my database ?

Regards,

@mickaelandrieu - One other workaround- I created an engine for this particular pandas method and turned off future (future=False) for the engine creation. It's working now. Going to leave a ToDo to come back and replace once pandas is working with this. Let me know if you found a less hacky way that's working without building the package off of master.

Comment From: scnerd

Just to clarify my remark about master, the RemovedIn20Warnings mean that, even on master, Pandas' SQL IO functions are not yet 2.0 compatible. Since SQLAlchemy 2.0 isn't even released yet (at least, as a stable release), I wouldn't consider that an issue (yet), it'll just require pinning sqlalchemy<=2 as a Pandas dependency until those incompatibilities are resolved. Using future=True in your SQLA session is therefore not yet compatible with Pandas.

Frankly, even though the SQLA 2.0 migration guide is great and the changes needed are pretty simple, there's also the concern that these changes are, in some cases, not backwards-compatible with <1.4. This is probably a concern for a library as ubiquitous as Pandas, and means that the upgrades should probably be made in a way that can detect and, when necessary, fall back on the <1.4 conventions.

Comment From: mickaelandrieu

@nsepetys yes, I've ended up doing that too for now.

So I have two configurations : legacy_engine (future = False) for pandas operations and engine (future = True) to start to learn the new API of SQLAlchemy.

@scnerd totally agree on your point of view : thanks for your help everyone !

Comment From: davidandreoletti

@mickaelandrieu

SQLAlchemy 2.0 is not out yet BUT SQLAlchemy 1.4 provides a path to prepare the migration to 2.0 (migration guide).

People running on 1.4 with SQLALchemy configured to be compatible with the future 2.0 (ie future=True) cannot execute pandas io/sql function without getting NotImplementedError: This method is not implemented for SQLAlchemy 2.0.

What's the plan going forward, except creating a legacy engine (future=False) ?

Comment From: jreback

the plan is the community needs to figure what should be done and propose PR with full testing

none of the core maintains use the sql support much - but could review

Comment From: davidandreoletti

@jreback Got it!

Comment From: mickaelandrieu

the plan is the community needs to figure what should be done and propose PR with full testing

Fully agree, but I'm not skilled enough to contribute pandas right now : not like I don't want to, I can help for tests/docs but not that much atm

Comment From: graingert

sqlalchemy 1.4.46 has started raising a sqlalchemy.exc.RemovedIn20Warning for deprecated functionality https://docs.sqlalchemy.org/en/14/changelog/changelog_14.html#change-e67bfa1efbe52ae40aa842124bc40c51