When I pass cursor as data to DataFrame constructor an error occurs.
cursor = sqlite.execute(sql)
pd.DataFrame(cursor)
/usr/lib/python3/dist-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
255 copy=copy)
256 elif isinstance(data, collections.Iterator):
--> 257 raise TypeError("data argument can't be an iterator")
258 else:
259 try:
TypeError: data argument can't be an iterator
But normal generators is accepted
def gen():
yield (1,2,3)
yield (4,5,6)
yield (7,8,9)
pd.DataFrame(gen())
Out[171]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
It feels like inconsistence.
Comment From: jreback
what is type(cursor).__mro__
Comment From: hexum
type(cursor).__mro__
(sqlite3.Cursor, object)
Comment From: hexum
It's not a big issue: passing cursor to list constructor results normal list wich is accepted by DataFrame. I just can't understand why iterable is not acepted. Type checking is a bad practice in Python, isn't it? Why just not to check ability to iterate?
hasattr([], "__iter__")
Comment From: jreback
type checking in python is fine. In pandas is actually quite a bit more complicated, because we need to determine, if, for example a list of-lists or list-of scalars are passed, then this is problematic
so an Iterable
must have __iter__
AND __len__
. A cursor doesn't have this property, while for example range(5)
does. (I know that in theory rowcount
works for his, but I don't think this is a guaranteed property). So a Cursor
should act much like a GeneratorType
(and not an Iterator
).
Comment From: hexum
I just create a generator with not defined length. And DataFrame accepts it as I expect. I think we should turn off type check. Anyway user may construct infinity generator and pass type checking.
In [20]: def g():
....: for i in range(5):
....: yield [i, i ** 2, i ** 3]
....:
In [21]: DataFrame(g())
Out[21]:
0 1 2
0 0 0 0
1 1 1 1
2 2 4 8
3 3 9 27
4 4 16 64
Comment From: jreback
a generator is fine. you have an iterator
. you can certainly make any changes you would like. but they would need to pass the test suite as is.
Comment From: hexum
Hmmm. I'll see how to overcome it.
Comment From: ns-cweber
@jreback Out of curiosity, why is a generator fine, but not an iterator? It looks like DataFrame's constructor checks to see if the data argument is a GeneratorType and then wraps it (data = list(data)
), but if it's an iterator it raises an exception.
Comment From: jreback
see my comment above and iterator is not sufficient as its not required to have a len
this is an old issue - so don't really remember have a look at the test code for Frame construction
Comment From: ns-cweber
Generator is also not required to have a len. The solution for generators in the DataFrame constructor is to consume it into a list() and from there treat it as though a list was passed as the data arg. The same should work for iters. If you're worried about infinite iterators, then you should be equally worried about infinite generators, unless I'm misunderstanding something. I'll take a look at the tests.
Comment From: TomAugspurger
Duplicate of https://github.com/pandas-dev/pandas/issues/2193
I also think it'll be possible to turn the iterable into a list, just like with a generator.