Pandas strange error with Pandas-0.19.0

Just to raise eyebrows, not sure if it's related to pandas-0.19 or to a bug of myself. It is ok under Pandas-0.18.1 (and seems not related to Python-3.6)

A small, complete example of the issue

# in Ipython
#Pandas 
import pandas as pd
import numpy as np

idx = pd.date_range('2000', '2005', freq='d', closed='left')
datas = pd.DataFrame({'Color':  [ 'green' if x> 1 else 'red' for x in np.random.randn(len(idx))], 
         'Measure': np.random.randn(len(idx)), 'Year': idx.year},
          index=idx)
#datas.head()
# checking baresql
from __future__ import print_function, unicode_literals, division  # line needed only if Python2.7
from baresql import baresql
bsql = baresql.baresql(connection="sqlite:///.baresql.db")
bsqldf = lambda q: bsql.df(q, dict(globals(),**locals()))

bsqldf("select Color, Year from datas$$")

Expected Output

this: pandas-0 19

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.beta.3 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 23 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.19.0 nose: 1.3.7 pip: 8.1.2 setuptools: 28.7.0 Cython: 0.25.1 numpy: 1.11.2 scipy: 0.18.1 statsmodels: 0.8.0rc1 xarray: 0.8.2 IPython: 5.1.0 sphinx: 1.4.8 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.7 blosc: 1.4.4 bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 0.9.3 lxml: None bs4: 4.5.1 html5lib: 0.999999999 httplib2: None apiclient: None sqlalchemy: 1.1.3 pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: 0.2.1

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/index.pyx in pandas.index.DatetimeEngine.get_loc (pandas\index.c:11091)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8120)()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
   2103             try:
-> 2104                 return self._engine.get_loc(key)
   2105             except KeyError:

pandas/index.pyx in pandas.index.DatetimeEngine.get_loc (pandas\index.c:11241)()

pandas/index.pyx in pandas.index.DatetimeEngine.get_loc (pandas\index.c:11147)()

pandas/index.pyx in pandas.index.DatetimeEngine._date_check_type (pandas\index.c:11311)()

KeyError: 'next'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
pandas/index.pyx in pandas.index.DatetimeEngine.get_loc (pandas\index.c:11091)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8120)()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\tseries\index.py in get_loc(self, key, method, tolerance)
   1410         try:
-> 1411             return Index.get_loc(self, key, method, tolerance)
   1412         except (KeyError, ValueError, TypeError):

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
   2105             except KeyError:
-> 2106                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2107 

pandas/index.pyx in pandas.index.DatetimeEngine.get_loc (pandas\index.c:11241)()

pandas/index.pyx in pandas.index.DatetimeEngine.get_loc (pandas\index.c:11147)()

pandas/index.pyx in pandas.index.DatetimeEngine._date_check_type (pandas\index.c:11311)()

KeyError: 'next'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
ValueError: Error parsing datetime string "next" at position 0

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
<ipython-input-7-526f069d545f> in <module>()
----> 1 bsqldf('''select Color, Year from datas$$  ''')

<ipython-input-2-7f10bad21e85> in <lambda>(q)
      3 from baresql import baresql
      4 bsql = baresql.baresql(connection="sqlite:///.baresql.db")
----> 5 bsqldf = lambda q: bsql.df(q, dict(globals(),**locals()))
      6 
      7 users = ['Alexander', 'Billy', 'Charles', 'Danielle', 'Esmeralda', 'Franz', 'Greg']

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\baresql\baresql.py in df(self, q, env)
    506     def df(self, q, env):
    507         "same as .cursor , but returns a pandas dataframe"
--> 508         cur = self.cursor( q, env)
    509         result = None
    510         rows = cur.fetchall()

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\baresql\baresql.py in cursor(self, q, env)
    483             pre_q = "DROP TABLE IF EXISTS %s" % table_sql.join(self.delimiters)
    484             cur = self._execute_sql (pre_q)
--> 485             self._write_table( table_sql, df, self.conn)
    486         #multiple sql must be executed one by one
    487         for q_single in self.get_sqlsplit(sql, True) :

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\baresql\baresql.py in _write_table(self, tablename, df, conn)
    450             to_sql(df, name = tablename, con = self.conn,  flavor = self.engine)
    451         else:
--> 452             to_sql(df, name = tablename, con = self.conn)
    453 
    454 

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in to_sql(frame, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
    468     pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
    469                       index_label=index_label, schema=schema,
--> 470                       chunksize=chunksize, dtype=dtype)
    471 
    472 

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype)
   1498         table = SQLiteTable(name, self, frame=frame, index=index,
   1499                             if_exists=if_exists, index_label=index_label,
-> 1500                             dtype=dtype)
   1501         table.create()
   1502         table.insert(chunksize)

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in __init__(self, *args, **kwargs)
   1261         # (this is what sqlalchemy does)
   1262         sqlite3.register_adapter(time, lambda _: _.strftime("%H:%M:%S.%f"))
-> 1263         super(SQLiteTable, self).__init__(*args, **kwargs)
   1264 
   1265     def sql_schema(self):

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in __init__(self, name, pandas_sql_engine, frame, index, if_exists, prefix, index_label, schema, keys, dtype)
    560         if frame is not None:
    561             # We want to initialize based on a dataframe
--> 562             self.table = self._create_table_setup()
    563         else:
    564             # no data provided, read-only mode

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in _create_table_setup(self)
   1297         """
   1298         column_names_and_types = \
-> 1299             self._get_column_names_and_types(self._sql_type_name)
   1300 
   1301         pat = re.compile('\s+')

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in _get_column_names_and_types(self, dtype_mapper)
    755              dtype_mapper(self.frame.iloc[:, i]),
    756              False)
--> 757             for i in range(len(self.frame.columns))
    758         ]
    759 

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in <listcomp>(.0)
    755              dtype_mapper(self.frame.iloc[:, i]),
    756              False)
--> 757             for i in range(len(self.frame.columns))
    758         ]
    759 

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in _sql_type_name(self, col)
   1338             return dtype[col.name]
   1339 
-> 1340         col_type = self._get_notnull_col_dtype(col)
   1341         if col_type == 'timedelta64':
   1342             warnings.warn("the 'timedelta' type is not supported, and will be "

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in _get_notnull_col_dtype(self, col)
    846         col_for_inference = col
    847         if col.dtype == 'object':
--> 848             notnulldata = col[~isnull(col)]
    849             if len(notnulldata):
    850                 col_for_inference = notnulldata

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    634             raise
    635 
--> 636         if is_iterator(key):
    637             key = list(key)
    638 

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\types\inference.py in is_iterator(obj)
     35 def is_iterator(obj):
     36     # python 3 generators have __next__ instead of next
---> 37     return hasattr(obj, 'next') or hasattr(obj, '__next__')
     38 
     39 

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   2740             return object.__getattribute__(self, name)
   2741         else:
-> 2742             if name in self._info_axis:
   2743                 return self[name]
   2744             return object.__getattribute__(self, name)

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\tseries\base.py in __contains__(self, key)
    232     def __contains__(self, key):
    233         try:
--> 234             res = self.get_loc(key)
    235             return is_scalar(res) or type(res) == slice or np.any(res)
    236         except (KeyError, TypeError, ValueError):

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\tseries\index.py in get_loc(self, key, method, tolerance)
   1417 
   1418             try:
-> 1419                 stamp = Timestamp(key, tz=self.tz)
   1420                 return Index.get_loc(self, stamp, method, tolerance)
   1421             except (KeyError, ValueError):

pandas/tslib.pyx in pandas.tslib.Timestamp.__new__ (pandas\tslib.c:9932)()

pandas/tslib.pyx in pandas.tslib.convert_to_tsobject (pandas\tslib.c:25231)()

pandas/tslib.pyx in pandas.tslib.convert_str_to_tsobject (pandas\tslib.c:26851)()

pandas/src/datetime.pxd in datetime._string_to_dts (pandas\tslib.c:87106)()

SystemError: <class 'str'> returned a result with an error set

Comment From: jreback

can you show an example not using an external library

Comment From: stonebig

Well, I could jut import the cell before from github

%load https://raw.githubusercontent.com/stonebig/baresql/master/baresql/baresql.py

run the cell, then

#Pandas 
import pandas as pd
import numpy as np

idx = pd.date_range('2000', '2005', freq='d', closed='left')
datas = pd.DataFrame({'Color':  [ 'green' if x> 1 else 'red' for x in np.random.randn(len(idx))], 
         'Measure': np.random.randn(len(idx)), 'Year': idx.year},
          index=idx)
# checking baresql
#from __future__ import print_function, unicode_literals, division  # line needed only if Python2.7
#from baresql import baresql
#bsql = baresql.baresql(connection="sqlite:///.baresql.db")
#bsqldf = lambda q: bsql.df(q, dict(globals(),**locals()))

bsqldf("select Color, Year from datas$$")

Sorry I don't know how to make it smaller. .. maybe some logs of sql orders.. 1 minute pandas-0 19bis

Comment From: jreback

my point is pls show an example that does include an external library that is not a dependency of pandas. otherwise this is not a pandas issue.

Comment From: stonebig

ok, I close and will re-open if I find a smaller pure pandas example.

Comment From: jorisvandenbossche

@stonebig for me this is working with pandas 0.19.0 and python 3.5

Comment From: stonebig

Yes, the error came on my face on Python-3.6.0b2, then b3, then I thought it was also on Python-3.5... no more sure . With Python3.6 comes sqlite 3.14.2, and python beta3 bugs . I'm on windows also ... not sure it can have an effect for this.

Comment From: stonebig

my problem goes away (under Pandas-0.19.0 / Python-3.6.0b3) if I remove the index from the original dataframe

idx = pd.date_range('2000', '2005', freq='d', closed='left')
datas = pd.DataFrame({'Color':  [ 'green' if x> 1 else 'red' for x in np.random.randn(len(idx))], 
         'Measure': np.random.randn(len(idx)), 'Year': idx.year}
                    # ,index=idx
                    )

Comment From: stonebig

even better, the error goes away under Python-3.6.0b3 if I use "idx.date":

datas = pd.DataFrame({'Color':  [ 'green' if x> 1 else 'red' for x in np.random.randn(len(idx))], 
         'Measure': np.random.randn(len(idx)), 'Year': idx.year}
                      ,index=idx.date
                    )

any explanation ? I suppose it should work on Python 3.4/3.5 also .. Is it to report as a bug to python-3.6 team ?

Comment From: jorisvandenbossche

@stonebig it would be interesting if you could check that it is for sure related to the use of 3.6

Also, if it is a bug in python 3.6 (or in pandas due to changes in python 3.6), it would still be useful if you could try to trim down the example to a small reproducible code snippet. For example, you can start with localizing which line in your code triggers the error, and then see if you can reproduce what that single line does in itself.

Comment From: stonebig

@jorisvandenbossche I'm not equipped to nail down this. I didn't do anything fancy, so it should show up when CI systems will have a nice python-3.6 toolchain.

Comment From: jorisvandenbossche

@stonebig You can do more than you think!

As an example: if you look at the traceback you posted above, you see:

C:\WinPython\basedir36\build\winpython-64bit-3.6.x.0\python-3.6.0b3.amd64\lib\site-packages\pandas\io\sql.py in _get_notnull_col_dtype(self, col)
    846         col_for_inference = col
    847         if col.dtype == 'object':
--> 848             notnulldata = col[~isnull(col)]
    849             if len(notnulldata):
    850                 col_for_inference = notnulldata

(this is the latest traceback from the io.sql code). col is one of the columns of your dataframe. So trying that code line with your example dataframe:

In [1]: idx = pd.date_range('2000', '2005', freq='d', closed='left')
   ...: datas = pd.DataFrame({'Color':  [ 'green' if x> 1 else 'red' for x in np.random.randn(len(idx))], 
   ...:          'Measure': np.random.randn(len(idx)), 'Year': idx.year},
   ...:           index=idx)

In [2]: col = datas['Color']

In [3]: col[~pd.isnull(col)]
....
SystemError: <class 'str'> returned a result with an error set

also triggers the error!

So further simplifying the reproducible example:

In [4]: col = pd.Series(['a', 'b', 'c', 'd'], index=pd.date_range('2012-01-01', periods=4))

In [5]: col[~pd.isnull(col)]
....
SystemError: <class 'str'> returned a result with an error set

In [6]: col = pd.Series(['a', 'b', 'c', 'd'])

In [7]: col[~pd.isnull(col)]
Out[7]: 
0    a
1    b
2    c
3    d
dtype: object

So when there is a non-integer index, the line raises an error, with default index it passes fine. This does not yet explain the error, but it's already much easier to start debugging with this.

So the above example works fine with python 3.5 / pandas 0.19.0, but fails in python 3.6. Reopening.

Comment From: jorisvandenbossche

Further trimming dow gives me hasattr(col, 'next') (from _is_iterator) raising the error, which calls somewhere pd.Timestamp('next'). Currently this raises a ValueError (not being able to convert it), while with 3.6 it raises a SystemError, which is not captured in the conversion code.

Comment From: stonebig

I should beat myself, slowing Pandas-0.19.1 release by disturbing you on Python-3.6.0b3

Comment From: jorisvandenbossche

:-) Don't worry, you can expect it tomorrow in any case

Comment From: jorisvandenbossche

@stonebig Opened a new issue for this datetime parsing bug, so it is more focused: https://github.com/pandas-dev/pandas/issues/14561

Pandas strange error with Pandas-0.19.0

A small, complete example of the issue

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`