I found that when using to_sql
on a dataframe with a datetime64 column with most of the values being NaT
, sqlalchemy fails. The column closed_at
is of type TIMESTAMP
on redshift (Timestamp without timezone).
df.info()
[output]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 11 entries, 0 to 1684
Data columns (total 11 columns):
created_at 11 non-null datetime64[ns]
updated_at 11 non-null datetime64[ns]
closed_at 2 non-null datetime64[ns]
state 11 non-null object
title 11 non-null object
body 11 non-null object
issue_submitter 11 non-null object
assignee 6 non-null object
labels 11 non-null object
html_url 11 non-null object
id 11 non-null int64
dtypes: datetime64[ns](3), int64(1), object(7)None
The column closed_at
has only 2 non empty values. if I delete the column prior to writing everything goes fine. However, when i call to_sql
with the column on the dataframe i get the following error:
Traceback (most recent call last):
df.to_sql(table, self.engine,if_exists='append',index=False)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 950, in to_sql
index_label=index_label)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/io/sql.py", line 475, in to_sql
index_label=index_label)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/io/sql.py", line 842, in to_sql
table.insert()
File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/io/sql.py", line 611, in insert
self.pd_sql.execute(ins, data_list)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/io/sql.py", line 810, in execute
return self.engine.execute(*args, **kwargs)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1682, in execute
return connection.execute(statement, *multiparams, **params)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 720, in execute
return meth(self, multiparams, params)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 317, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 817, in _execute_clauseelement
compiled_sql, distilled_params
File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 947, in _execute_context
context)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1108, in _handle_dbapi_exception
exc_info
File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 917, in _execute_context
context)
File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 432, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.DataError: (DataError) invalid input syntax for type timestamp: "0001-255-255T00:00:00"
Comment From: TomAugspurger
I can probably test this at work tomorrow. Just to make sure, this works for you when most of the columns are not null right?
Btw, you'll get MUCH better write performance to redshift if you go through S3 and use their COPY machinery. I've just monkeypatched DataFrame with a to_redshift
method that goes through s3. You might also look at odo's AWS support.
Comment From: manugarri
yeah, it works if most of the column datetime64 column is not null.
Yeah I know about s3 copy. db.py implements a very nice to_redshift
method that i use for big batch inserts. Maybe its just best to switch?
Comment From: jorisvandenbossche
@manugarri What version of pandas are you using (show the output of pd.show_versions()
)?
Normally issues with NaN values should be fixed in the 0.15.x (but this is not tested for redshift)
Comment From: manugarri
oh, interesting. I thought Anaconda provided 0.15, but I seem to be using 0.14.1
Going to update and test it again.
Comment From: manugarri
My bad. Same code works perfectly in 15.
Comment From: jorisvandenbossche
Good to know, thanks for reporting anyway!