Pandas BUG: custom callable in to_sql stopped working with pandas 1.4.0

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import csv

import pandas as pd

from io import StringIO
from sqlalchemy import create_engine

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

# define custom insert method for to_sql as written in documentation https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html?highlight=synchronize#insertion-method

def psql_insert_copy(table, conn, keys, data_iter):
    """
    Execute SQL statement inserting data

    Parameters
    ----------
    table : pandas.io.sql.SQLTable
    conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
    keys : list of str
        Column names
    data_iter : Iterable that iterates the values to be inserted
    """
    # gets a DBAPI connection that can provide a cursor
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)

        columns = ', '.join(['"{}"'.format(k) for k in keys])
        if table.schema:
            table_name = '{}.{}'.format(table.schema, table.name)
        else:
            table_name = table.name

        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
            table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)

# you need to have PostgreSQL running and give correct credentials here to reproduce the issue
engine = create_engine(f'postgresql://{'user'}:{'password'}@{'host'}/{'db'}')

with engine.connect() as conn:
     df.to_sql('target_table', 
               con=conn,
               schema='public',
               if_exists='append',
               index=False,
               method=psql_insert_copy)

Issue Description

When I use custom method for PostgreSQL in DataFrame.to_sqlaccording to the docummentation , the table is created with correct column names and numeric types (if it doesn't exist) but no values are imported (table is empty), no warning or error is raised in pandas nor in the database.

It works in pandas~=1.3.5 and breaks when 1.4.0 or 1.4.1 is installed (keeping sqlalchemy and psycopg2 on same versions). I am suspecting something related to the change that to_sql can now return an Int value instead or None, but couldn't find how that is related or how it can be fixed.

Expected Behavior

Full df is imported into the table including its values.

Installed Versions

INSTALLED VERSIONS

commit : 06d230151e6f18fdb8139d09abf539867a8cd481 python : 3.8.10.final.0 python-bits : 64 OS : Linux OS-release : 5.13.0-39-generic Version : #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.4.1 numpy : 1.22.3 pytz : 2022.1 dateutil : 2.8.2 pip : 20.0.2 setuptools : 45.2.0 Cython : None pytest : 7.0.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.8.0 html5lib : None pymysql : None psycopg2 : 2.9.3 jinja2 : 2.11.2 IPython : 7.19.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fastparquet : None fsspec : None gcsfs : None matplotlib : 3.4.3 numba : None numexpr : 2.7.2 odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : 1.0.8 s3fs : None scipy : 1.6.3 sqlalchemy : 1.4.34 tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None zstandard : None

Comment From: simonjayhawkins

moving to 1.4.4

Comment From: mroeschke

Sorry for the slow reply, but could you provide more details regarding your testing setup with Postgres?

Even with the introduction of to_sql returning row counts in 1.4, this specific example is being in our CI using a Postgres container in Github Actions

https://github.com/pandas-dev/pandas/blob/d43d6e27d1af0b07df495e113e9818fc2f1909b1/pandas/tests/io/test_sql.py#L623-L649

Comment From: leyan2015

I had the same issue after I upgraded to latest Anaconda distribution ( Python 3.9, linux 64-Bit (x86) ). pd._to_sql() pd.read_sql() would all fail.

I did some digging around and discovered that it is the special character "@" in my password mess up the log in. Looks like the parser in the library has a bug. Maybe come from the sqlAlchemy side.

The same user name password would work fine using sqlcmd: ./sqlcmd -S serverName -U yanni2006 -P 'XXX@ABC' -d dbName -Q "select * from db.schema.test"

Yet it fails in python:

import sqlalchemy import pandas as pd uid = 'yanni2006 ' password = 'XXX@ABC' server = 'serverName' db= 'dbName ' conn_str = 'mssql+pyodbc://{}:{}@{}/{}?driver=ODBC+Driver+17+for+SQL+Server'.format(uid, password, server, db) sqlEngine = sqlalchemy.create_engine(conn_str) conn= sqlEngine.connect()

The conn would fail here: sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')

Obviously if I do with sqlEngine.begin() as trans: df= pd.read_sql(con=trans, sql="select * from db.schema.test"

would fail too.

After remove the "@ " from my password, everything worked fine. Before a fix comes out, you can get around by change your password.

also, to_sql() function return -1 even though the operation is successful. the number of rows affected obviously not -1

Comment From: mroeschke

also, to_sql() function return -1 even though the operation is successful. the number of rows affected obviously not -1

The returned number of rows is the sum of the reported rows affected by whatever underlying ORM library used.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

The number of returned rows affected is the sum of the rowcount attribute of sqlite3.Cursor or SQLAlchemy connectable which may not reflect the exact number of written rows as stipulated in the sqlite3 or SQLAlchemy.

Comment From: mroeschke

Thanks for the report, but until we receive more context around the failure scenario this code is being tested as mentioned in https://github.com/pandas-dev/pandas/issues/46596#issuecomment-1163653846. Closing, but we can reopen if we get more information.

Comment From: utsavjha

I can confirm the same issue being experienced by me. Using with Postgres. Here is the debugging performed on my end.

Environment: Docker 4.12, Python v3.10-bullseye source image Database: Postgres (v15) My venv packages: pandas, psycopg2==2.9.5 and SQLAlchemy==1.4.43

Affected Pandas Version: >=v1.3.6. All distributions after v1.3.5 are unable to write. I tested each version from v1.3.5 to v1.5.1, same behavior.

My invocation looks like this, where the callable is referenced from official Pandas docs: some_pandas_dataframe.to_sql( name=pg_table_name, schema=pg_schema_name, if_exists="append", index=False, chunksize=pg_export_chunk_size, con=pg_conn, method=PostgresConnector.psql_insert_copy, )

Funny behaviour that I did experience during my testing: - there is no error thrown in the code. Tests succeed too. - Since using Postgres v15, one can track the last run COPY queries using the pg_stat_progress_copy: This view was empty / did not contain the insertion query like it should. Pandas v1.3.5 successfully logs the query there and greater versions do NOT. - I was tracking the DB File Stats - there is an uptick in the tempDB files during the export. Yet, there is no data recovered from the database: I saw my WAL expanding viciously only DURING the export. After the exports completed in code, the expected tables were still empty.

Comment From: vroomzel

I'm having the same issue with pandas==1.5.0 and sqlalchemy== 1.4.36. And our service accounts that use pandas 1.3.4 and sqlalchemy 1.4.22 work just fine.