can pandas have twisted protocol support for accessing db?
Comment From: cpcloud
You'll need to provide a bit more information including examples of what this means and why it would be useful. I personally don't use twisted so this would be helpful.
Comment From: cpcloud
More specially what is inconvenient to do with the current db interface that would be more convenient with the twisted protocols?
Comment From: GomesNayagam
basically i am looking for asynchronous way accessing db via pandas sql library. Since twisted is widely used asynchronous protocol if you use any module along with twisted also need to have similar support.
we currently use cyclone.io(tornado) along with various module to access rabbitmq,redis and postgres sql. Every vendor has twisted wrapper to work with web framework. e.g psycopg2 has twisted support.
Comment From: jorisvandenbossche
Currently the sql module in pandas is implemented based on SQLAlchemy. A quick googling lead me to https://pypi.python.org/pypi/alchimia/0.4 which brings twisted support to SQLAlchemy. Would that possibly be helpful?
Comment From: GomesNayagam
great that sqlalchemy has the twisted support but pandas.read_sql_table() api currently is not using twisted way of creating engine(i checked the code at pandas/io/sql.py). probably might be significant effort.
Comment From: jorisvandenbossche
What do you exactly mean with "twisted way of creaing engine"? (creation of the engine is up to the user)
I am also not a user of twisted, so you will have to provide more concrete information and examples what this twisted support exactly implies (eg with psycopg2, how does the usage of psycopg2 changes? I didn't find something in the psycopg2 docs on this), or try to figure it out yourself. If it turns out to be possible (or some functions should be changed slightly to support it), I think we can certainly consider adding it.
Comment From: GomesNayagam
sorry for the confusion, we are not using psycopg2 for twisted way of calling db query rather we use twisted adbapi.
as per the latest sqlalchemy twisted support, as you said i can very well pass the twisted engie to pandas read_sql_table but it will expect the @inlinecallback and yeild keyword in each method return inside pandas sql api.
1) twisted way of calling sql alchemy query : yield engine.execute(CreateTable(users)) 2) normal pandas way of calling alchmey query : engine.execute(..)
so, obviously when calling method is twisted way(which means @inlinecallback decorator and yield keyword) the called method signature too need to be the same otherwise compilation/runtime error.
In this https://pypi.python.org/pypi/alchimia/0.4 example if you/i could incorporate pandas to insert User object then we dont need this discussion. I will try sometime later and let you know. BTW if you in between please update me.
All i am looking is, will the following code will work with this example https://pypi.python.org/pypi/alchimia/0.4
SELECT date(created_at) as date, count(*) as count FROM events GROUP BY 1""" df = read_sql(query, db_connection)
Comment From: hayd
It may be as simple as subclassing PandasSQLAlchemy and overriding the execute method... maybe.
Comment From: GomesNayagam
wow thanks hayd. Here is the code. --Code starts here---
from alchimia import TWISTED_STRATEGY
from sqlalchemy import (
create_engine, MetaData, Table, Column, Integer, String
)
from sqlalchemy.schema import CreateTable
from twisted.internet import defer
from twisted.internet.task import react
from pandas.io.sql import *
import pandas as pd
from sqlalchemy.pool import QueuePool
class MyPandasSQLAlchemy(PandasSQLAlchemy):
@defer.inlineCallbacks
def execute(self, *args, **kwargs):
"""Simple passthrough to SQLAlchemy engine"""
result = yield self.engine.execute(*args, **kwargs)
defer.returnValue(result)
@defer.inlineCallbacks
def main(reactor):
engine = create_engine(
"sqlite:///file.db", poolclass=QueuePool, reactor=reactor, strategy=TWISTED_STRATEGY
)
metadata = MetaData()
users = Table("users", metadata,
Column("id", Integer(), primary_key=True),
Column("name", String()),
)
'''
# Create the table
yield engine.execute(CreateTable(users))
# Insert some users
yield engine.execute(users.insert().values(name="Jeremy Goodwin"))
yield engine.execute(users.insert().values(name="Natalie Hurley"))
yield engine.execute(users.insert().values(name="Dan Rydell"))
yield engine.execute(users.insert().values(name="Casey McCall"))
yield engine.execute(users.insert().values(name="Dana Whitaker"))
'''
query = "SELECT * FROM users"
pandas_sql = MyPandasSQLAlchemy(engine, meta=metadata)
df = yield pandas_sql.execute(query)
df_users = yield df.fetchall()
df = pd.DataFrame(df_users)
print df.head()
'''
# raw alchemy way
result = yield engine.execute(query)
d_users = yield result.fetchall()
# Print out the users
for user in d_users:
print "Username: %s" % user[users.c.name]
'''
if __name__ == "__main__":
react(main, [])
Comment From: jorisvandenbossche
Nice to see it works!
Now you only used the execute
method and did the conversion to a dataframe manually, but did you also try to use read_sql_query
(which uses the execute method and does the wrapping into a dataframe? Or even the other functions (read_table
, to_sql
)?
Comment From: immerrr
It may be as simple as subclassing PandasSQLAlchemy and overriding the execute method... maybe.
Twisted is not that easy to support: it uses explicit Future (Deferred
) objects as return values for asynchronous operations and you cannot support both sync and async modes of functioning with same code, unless you account for that explicitly by either exposing public API via callbacks, using coroutines/generators to read values or running synchronous code in separate thread. I didn't see any of that in pandas sql code (I must admit I didn't dig deeply), so it may not be so easy.
Comment From: GomesNayagam
yes, i agree with @immerrr, need to expose separate set of api(module) for twisted support. Here i have provided code sample to explain the request. thanks @jorisvandenbossche @hayd for your response to my query. you guys can decide about this ticket as you wish.
Comment From: jreback
out-of-scope ATM. If user wants to propose adding a sub-class then pls reopen.