I found that when using to_sql on a dataframe with a datetime64 column with most of the values being NaT, sqlalchemy fails. The column closed_at is of type TIMESTAMP on redshift (Timestamp without timezone).

df.info()
[output]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 11 entries, 0 to 1684
Data columns (total 11 columns):
created_at         11 non-null datetime64[ns]
updated_at         11 non-null datetime64[ns]
closed_at          2 non-null datetime64[ns]
state              11 non-null object
title              11 non-null object
body               11 non-null object
issue_submitter    11 non-null object
assignee           6 non-null object
labels             11 non-null object
html_url           11 non-null object
id                 11 non-null int64
dtypes: datetime64[ns](3), int64(1), object(7)None

The column closed_at has only 2 non empty values. if I delete the column prior to writing everything goes fine. However, when i call to_sql with the column on the dataframe i get the following error:

Traceback (most recent call last):
  df.to_sql(table, self.engine,if_exists='append',index=False)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 950, in to_sql
    index_label=index_label)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/io/sql.py", line 475, in to_sql
    index_label=index_label)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/io/sql.py", line 842, in to_sql
    table.insert()
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/io/sql.py", line 611, in insert
    self.pd_sql.execute(ins, data_list)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/pandas/io/sql.py", line 810, in execute
    return self.engine.execute(*args, **kwargs)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1682, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 720, in execute
    return meth(self, multiparams, params)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 317, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 817, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 947, in _execute_context
    context)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1108, in _handle_dbapi_exception
    exc_info
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 917, in _execute_context
    context)
  File "/home/manugarri/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 432, in do_executemany
    cursor.executemany(statement, parameters)
sqlalchemy.exc.DataError: (DataError) invalid input syntax for type timestamp: "0001-255-255T00:00:00"

Comment From: TomAugspurger

I can probably test this at work tomorrow. Just to make sure, this works for you when most of the columns are not null right?

Btw, you'll get MUCH better write performance to redshift if you go through S3 and use their COPY machinery. I've just monkeypatched DataFrame with a to_redshift method that goes through s3. You might also look at odo's AWS support.

Comment From: manugarri

yeah, it works if most of the column datetime64 column is not null.

Yeah I know about s3 copy. db.py implements a very nice to_redshift method that i use for big batch inserts. Maybe its just best to switch?

Comment From: jorisvandenbossche

@manugarri What version of pandas are you using (show the output of pd.show_versions())?

Normally issues with NaN values should be fixed in the 0.15.x (but this is not tested for redshift)

Comment From: manugarri

oh, interesting. I thought Anaconda provided 0.15, but I seem to be using 0.14.1

Going to update and test it again.

Comment From: manugarri

My bad. Same code works perfectly in 15.

Comment From: jorisvandenbossche

Good to know, thanks for reporting anyway!