I'm using the following code to import datetimes from a SQL Server database into Pandas:
data = pd.read_sql('select top 10 timestamp from mytable',db)
'MyTable' contains a column 'Timestamp', which is of type DateTime2.
If db
is a pyodbc database connection this works fine, and my timestamps are returned as data type 'datetime64[ns]'. However if db
an SQL Alchemy engine created using create_engine('mssql+pyodbc://...')
then the timestamps returned in data are of type 'object' and cause problems later on in my code.
I'm using pandas version 0.14.1, pyodbc version 3.0.7 and SQL alchemy version 0.9.4. I'd like the default behaviour with the SQL Alchemy engine to be the same as pyodbc, i.e. to import all datetimes as datetime64[ns].
Originally from: http://stackoverflow.com/questions/25607775/datetime-import-using-pandas-sqlalchemy
Comment From: jorisvandenbossche
Can you do some comparisons with raw execute
(to look at what is returned before it is put in a DataFrame, so to see if it is an issue with pandas or sqlalchemy)?
# with pyobdc connection
cur = con.cursor()
cur.execute("select top 1 timestamp from mytable").fetchall()
# with sqlalchemy engine
engine.execute("select top 1 timestamp from mytable").fetchall()
# with sqlalchemy engine but getting the raw connection
cur = engine.raw_connection().cursor()
cur.execute("select top 1 timestamp from mytable").fetchall()
Comment From: oetcoleman
With the pyodbc connection I get a datetime:
[(datetime.datetime(2013, 10, 26, 2, 47, 2, 266000), )]
With the SQL Alchemy engine I get a string:
[('2013-10-26 02:47:02.2660000',)]
With the raw connection from SQL Alchemy I get a string as well:
[('2013-10-26 02:47:02.2660000', )]
Comment From: oetcoleman
Furthermore, this looks to be specific to table columns that are in DATETIME2
format. For instance the following returns a datetime by converting the DATETIME2
in SQL:
engine.execute("select top 1 CONVERT(DATETIME,timestamp) from mytable").fetchall()
Comment From: jorisvandenbossche
It seems this is then caused by sqlalchemy? Strange, as the docs clearly list DATETIME2
as a recognised type for SQL Server: http://docs.sqlalchemy.org/en/rel_0_9/dialects/mssql.html#sqlalchemy.dialects.mssql.DATETIME2
Comment From: oetcoleman
Looks like the problem originates from SQL Alchemy using specifying 'SQL Server' as the driver when creating its pyodbc connection string, whereas I was using 'SQL Server Native Client 10.0'. If I can override the driver setting in SQL Alchemy then that should hopefully solve the problem.
Further info on drivers: http://msdn.microsoft.com/en-us/library/ms130828.aspx
Comment From: jorisvandenbossche
Maybe look here: http://docs.sqlalchemy.org/en/rel_0_9/dialects/mssql.html#additional-connection-examples, there are some examples of creating the engines and how this corresponds to pyodbc connection strings.
Comment From: oetcoleman
Thanks - it turns out I can do this by creating the SQL Alchemy engine as follows:
connectionString = 'mssql+pyodbc://username:password@my_server/my_database_name?driver=SQL Server Native Client 10.0'
engine = sql.create_engine(connectionString)
Comment From: jorisvandenbossche
And then it works? (retrieves the DATETIME2 as native python datetimes?)
If you think this should also work without specifying this, you can always report an issue on https://bitbucket.org/zzzeek/sqlalchemy
Comment From: oetcoleman
Yes - it works as expected when I do that. Will ask SQL Alchemy about this, as my understanding is that 'SQL Server' is an old driver, hence is doesn't support DATETIME2.
Comment From: jorisvandenbossche
@oetcoleman OK, then this issue can be closed. Maybe we should add some a warning do the docs?
Comment From: oetcoleman
Yes, can be closed. Would be helpful to include in the docs, perhaps in http://pandas.pydata.org/pandas-docs/stable/io.html#engine-connection-examples to help anyone else with the same issue.
Comment From: TomAugspurger
Closing, though improvements to the docs are always welcome.