Pandas Issue pyodbc and to_sql HIVE - Nineya|java/go/python

Code Sample, a copy-pastable example if possible

engine_clus= pyodbc.connect(connection_string, autocommit=True)

df.to_sql("table", con=engine_clus, if_exists='append', index = False, schema = 'schema')

Problem description

We need to load in HIVE a dataframe which contains information from a source cluster. Using pyodbc we can create, drop and select but we can't do inserts because of the following error: DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('HY000', '[HY000] [Microsoft][Hardy] (79) Failed to reconnect to server. (79) (SQLPrepare)')

Expected Output

We expected to be able to insert information from a dataframe.

INSTALLED VERSIONS ------------------ commit : None python : 3.6.5.final.0 python-bits : 64 OS : Windows OS-release : 10 byteorder : little LC_ALL : None LANG : es LOCALE : None.None pandas : 0.25.3 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 10.0.1 setuptools : 45.1.0 Cython : 0.28.2 pytest : 3.5.1 hypothesis : None sphinx : 1.7.4 blosc : None feather : None xlsxwriter : 1.0.4 lxml.etree : 4.2.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10 IPython : 6.4.0 pandas_datareader: None bs4 : 4.6.0 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.2.1 matplotlib : 2.2.2 numexpr : 2.6.5 odfpy : None openpyxl : 2.5.3 pandas_gbq : None pyarrow : 0.15.1 pytables : None s3fs : None scipy : 1.1.0 sqlalchemy : 1.2.7 tables : 3.4.3 xarray : None xlrd : 1.1.0 xlwt : 1.3.0 xlsxwriter : 1.0.4

Comment From: jorisvandenbossche

@sergiobaquero for to_sql, you need to pass a sqlalchemy engine or connection (if you pass a raw db driver connection, it assumes it is a sqlite connection, which obviously fails here. We should probably improve the error message)

Comment From: sergiobaquero

Thank you for your quick answer @jorisvandenbossche. I was confuse because I could execute selects, create and drops but no INSERTS. Is there any alternative to load a dataframe in Hive using a pyodbc connection? I am using an Anaconda env for windows 10 and I can't install the sasl library, and I am trying to find an alternartive.

Thank you. Best regards, Sergio Baquero

Comment From: jorisvandenbossche

You can still use pandas' to_sql, you only need to use a Hive driver through sqlalchemy. And it seems that there are packages available providing Hive support for sqlalchemy, eg https://github.com/dropbox/PyHive, https://github.com/AurelienGalicher/PyHiveODBC

Comment From: jbrockmendel

is this actionable? looks to me like Joris handled the issue

Comment From: mroeschke

Yeah closing