Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
from sqlalchemy import (
create_engine,
Table,
Column,
)
from sqlalchemy.types import (
BigInteger,
)
from sqlalchemy.sql import (
select,
)
from sqlalchemy.orm import (
declarative_base,
)
table_name = 'test'
Base = declarative_base()
class Jobs(Base):
__table__ = Table(
table_name,
Base.metadata,
Column('id', BigInteger, primary_key=True),
Column('a', BigInteger),
Column('b', BigInteger),
)
df = pd.DataFrame(
{
'id': [1],
'a': [2],
'b': [3],
},
dtype='int64',
)
engine = create_engine('sqlite:///:memory:')
df.to_sql(
table_name,
engine,
index=False,
if_exists='replace',
)
# This results in the pickled dataframe dependent on sqlalchemy
df1 = pd.read_sql_query(
select(Jobs),
engine,
)
# This dataframe is not dependent on sqlalchemy
df2 = pd.read_sql_query(
f'SELECT * FROM {table_name}',
engine,
)
assert df1.equals(df2)
df1.to_pickle('test1.pkl')
df2.to_pickle('test2.pkl')
Issue Description
read_sql_query()
returns subtly different dataframes depending on the type of the sql
parameter used.
Although the two dataframes are equal to each other, their difference manifests when they get pickled.
If the sql
parameter is an SQLAlchemy Selectable, the resulting dataframe contains a dependency on the sqlalchemy module.
Unless sqlalchemy
is installed in the Python virtual environment, such a pickled dataframe cannot be read. The error message states: ModuleNotFoundError: No module named 'sqlalchemy'
Expected Behavior
I expect that both test1.pkl
and test2.pkl
should be possible to read in the virtual environment where only Pandas is installed.
Installed Versions
Comment From: NumanIjaz
take