with pandas 0.16 and sqlalchemy, the to_sql method of a dataframe creates duplicate redundant constraints on each boolean column, at least when exporting to Microsoft SQL Server, because for each column it creates 2 constraints checking the field must be either 0 or 1. This is redundant and slows down the database. A short example is below. Of course you must check the output in the sql database. Also, the fact that there is no control and no documentation over the constraints created by to_sql is extremely frustrating

import pandas as pd import numpy as np from sqlalchemy import create_engine, MetaData, Table, select import sqlalchemy as sqlalchemy

from sqlalchemy import create_engine, MetaData, Table, select import sqlalchemy as sqlalchemy

ServerName = "cdodb-wb" Database = "DealAnalysis"

engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database, encoding='latin1') conn = engine.connect()

metadata = MetaData(conn)

df=pd.DataFrame(np.random.rand(20)>0.5, columns=['random_boolean'])

df.to_sql('python_test_table',engine,if_exists='replace')

Comment From: jorisvandenbossche

Thanks for the report!

Can you give some more details? Show the resulting table schema that you get, and what you would want?

Comment From: pythonbeginner2015

Sure. In SQL server management studio, if you double click on the table and expand "constraints", you'll notice the duplication of constraints. If you right click on the table and do script table as -> create to -> new query editor you'll get the following code, which also shows the duplication of constraints:

CREATE TABLE [dbo].[python_test_table](
    [index] [bigint] NULL,
    [random_boolean] [bit] NULL
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[python_test_table]  WITH CHECK ADD CHECK  (([random_boolean]=(1) OR [random_boolean]=(0)))
GO

ALTER TABLE [dbo].[python_test_table]  WITH CHECK ADD CHECK  (([random_boolean]=(1) OR [random_boolean]=(0)))
GO
```sql

Comment From: pythonbeginner2015

If instead I run:

df.to_sql('python_test_table',engine,if_exists='replace', dtype={'random_boolean': sqlalchemy.Boolean(create_constraint=False)})

then no constraints are created. If I run it with create_constraint=True (which is the default), then two duplicate constraints are created, as mentioned above

Comment From: jorisvandenbossche

What version of sqlalchemy are you using? Pandas does not do anything with constraints itself (it only adds primary keys), so this is probably a bug in the SQL Server dialect of SQLAlchemy.

Can you report over there?

Comment From: jorisvandenbossche

Eg with sqlite, only one constraint check is added (don't have SQL Server available at the moment):

In [20]: print pd.io.sql.get_schema(df, 'name', con=engine)

CREATE TABLE name (
        a BOOLEAN,
        b BIGINT,
        CHECK (a IN (0, 1))
)

Comment From: jorisvandenbossche

@pythonbeginner2015 So I would say that from the pandas side, as you can use dtype={'random_boolean': sqlalchemy.Boolean(create_constraint=False)} this is solved?

Comment From: pythonbeginner2015

Well, I'd say there are ways around it, I wouldn;t say it's 100% solved, although this is probably more to do with sqlalchemy than with pandas. If I have a large number of boolean columns I must pass a dictionary to dtype specifying create_constraint=False for each such column - much more fiddly than would ideally be necessary

Comment From: jorisvandenbossche

As I said on SO, you can quite easily solve this with {col: Boolean(create_constraint=False) for col in df.select_dtypes(['bool']).columns}, but I agree this is not that clean.

the issue just comes from the fact that SQL server does not support booleans and therefore that sqlalchemy defaults to creating these checks. A clean solution would be to have a more global (engine-level) option to disable these checks (but for this you have to report this to sqlalchemy), or to be able to specify in some way in to_sql(.., dtype=..) to apply something for all columns of a certain datatype. If you have suggestions on a possible syntax for this, very welcome!

Comment From: jorisvandenbossche

Closing as the initial issue (the duplicate constraints) seems an issue with SQLAlchemy.

@pythonbeginner2015 Do you report it over there?