Code Sample, a copy-pastable example if possible
engine_clus= pyodbc.connect(connection_string, autocommit=True)
df.to_sql("table", con=engine_clus, if_exists='append', index = False, schema = 'schema')
Problem description
We need to load in HIVE a dataframe which contains information from a source cluster. Using pyodbc we can create, drop and select but we can't do inserts because of the following error: DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('HY000', '[HY000] [Microsoft][Hardy] (79) Failed to reconnect to server. (79) (SQLPrepare)')
Expected Output
We expected to be able to insert information from a dataframe.
Comment From: jorisvandenbossche
@sergiobaquero for to_sql
, you need to pass a sqlalchemy engine or connection (if you pass a raw db driver connection, it assumes it is a sqlite connection, which obviously fails here. We should probably improve the error message)
Comment From: sergiobaquero
Thank you for your quick answer @jorisvandenbossche. I was confuse because I could execute selects, create and drops but no INSERTS. Is there any alternative to load a dataframe in Hive using a pyodbc connection? I am using an Anaconda env for windows 10 and I can't install the sasl library, and I am trying to find an alternartive.
Thank you. Best regards, Sergio Baquero
Comment From: jorisvandenbossche
You can still use pandas' to_sql, you only need to use a Hive driver through sqlalchemy. And it seems that there are packages available providing Hive support for sqlalchemy, eg https://github.com/dropbox/PyHive, https://github.com/AurelienGalicher/PyHiveODBC
Comment From: jbrockmendel
is this actionable? looks to me like Joris handled the issue
Comment From: mroeschke
Yeah closing