Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame(...)
df.to_sql(...)
Issue Description
Function pandas.DataFrame.to_sql() writes records stored in a DataFrame to a SQL database. Databases supported by SQLAlchemy are supported. This function will call 'sql.py' in 'pandas/pandas/io/sql.py'.
When applied to a database supported by SQLAlchemy, 'sql.py' will use the class 'SQLAlchemyEngine(BaseEngine)', which has this line causing the reported bug: https://github.com/pandas-dev/pandas/blob/main/pandas/io/sql.py#L1392.
Basically, method 'insert_records' handles 'SQLAlchemyError' exceptions. But not all child class of base class 'SQLAlchemyError' has the attribute 'orig'. Doc: https://docs.sqlalchemy.org/en/14/core/exceptions.html
Therefore, adding a checking attribute step will fix this bug.
Expected Behavior
AttributeError: CompileError object has no attribute orig
Installed Versions
Comment From: phofl
Can you provide something reproducible?
Comment From: thanhbui221
My OS: macOS Ventura 13.0 MySQL: Ver 8.0.31 for macos13.0 on arm64 (Homebrew) sqlalchemy: 1.4.44
I created the database named ‘pandas_tosql_test’ in the local instance of mySQL server.
Reproducible Example
import pandas as pd
from sqlalchemy import MetaData, Table
from sqlalchemy.dialects import mysql
from sqlalchemy.sql import text
from sqlalchemy.engine import create_engine
# These two test_frame functions are taken directly from 'pandas/tests/io/test_sql.py'
def test_frame1():
columns = ["index", "A", "B", "C", "D"]
data = [
(
"2000-01-03 00:00:00",
0.980268513777,
3.68573087906,
-0.364216805298,
-1.15973806169,
),
(
"2000-01-04 00:00:00",
1.04791624281,
-0.0412318367011,
-0.16181208307,
0.212549316967,
),
(
"2000-01-05 00:00:00",
0.498580885705,
0.731167677815,
-0.537677223318,
1.34627041952,
),
(
"2000-01-06 00:00:00",
1.12020151869,
1.56762092543,
0.00364077397681,
0.67525259227,
),
]
return pd.DataFrame(data, columns=columns)
def test_frame2():
columns = ["index", "A", "B", "C", "D", "E"]
data = [
(
"2000-01-03 00:00:00",
0.980268513777,
3.68573087906,
-0.364216805298,
-1.15973806169,
123
),
(
"2000-01-04 00:00:00",
1.04791624281,
-0.0412318367011,
-0.16181208307,
0.212549316967,
123
),
(
"2000-01-05 00:00:00",
0.498580885705,
0.731167677815,
-0.537677223318,
1.34627041952,
123
),
(
"2000-01-06 00:00:00",
1.12020151869,
1.56762092543,
0.00364077397681,
0.67525259227,
123
),
]
return pd.DataFrame(data, columns=columns)
def create_upsert_method(meta):
def upsert_method(table, conn, keys, data_iter):
sql_table = Table(table.name, meta, autoload=True)
insert_stmt = mysql.insert(sql_table).values([dict(zip(keys, data)) for data in data_iter])
upsert_stmt = insert_stmt.on_duplicate_key_update({x.name: x for x in insert_stmt.inserted if x.name in keys})
conn.execute(upsert_stmt)
return upsert_method
#change user and password according to yours.
user = 'user'
password = 'password'
host = '127.0.0.1'
port = 3306
schema = 'pandas_tosql_test'
engine = create_engine(
f"mysql+mysqldb://{user}:{password}@{host}:{port}/{schema}")
conn = engine.connect()
# Firstly, create table 'test_frame' in the database 'pandas_tosql_test'.
with conn.begin():
df1 = test_frame1()
df1.to_sql('test_frame', con=engine)
# Secondly, upsert the data of another data frame to table 'test_frame'.
with conn.begin():
df2 = test_frame2()
meta = MetaData(conn)
method = create_upsert_method(meta)
df2.to_sql(name="test_frame", con=conn, if_exists="append", index=False, chunksize=1000, method=method)
#Error: AttributeError: 'CompileError' object has no attribute 'orig'
# The problem is that the schema of the two data frames is different, but above error did not show exactly it.
Comment From: jomach
same error here