Pandas "'TableIterator' object is not an iterator" when calling df.read_hdf()

Code Sample, a copy-pastable example if possible

reader = pd.read_hdf(filename, 'foo', chunksize=1000)
next(reader)

TypeError: 'TableIterator' object is not an iterator

Problem description

Unless I'm missing something, I would expect something called TableIterator to be (itself) an iterator. Since this is an analog of df.read_csv(), I would expect it to behave along the same lines (which is TextFileReader and works with next()).

reader.__iter__() is defined, so for chunk in reader works. It's just slightly surprising that TextFileReader defines __next__(), but TableIterator does not.

Expected Output

next(reader) should give the dataframe corresponding to that chunk.

(yes I know Iterators and Iterables are slightly different, but in this case I would expect both to be supported)

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-79-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 36.0.1 Cython: None numpy: 1.12.1 scipy: 0.19.0 statsmodels: None xarray: None IPython: 6.1.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: 3.4.2 numexpr: 2.6.2 matplotlib: 2.0.0 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.5.3 html5lib: 0.999999999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 boto: None pandas_datareader: None

Comment From: jreback

this is not implemented (well it IS but it doesn't inherit from the proper machinery), see the issue here: https://github.com/pandas-dev/pandas/issues/9496. Its a pretty easy PR actually if you'd like to do this.

Here's a full repro

In [1]: tm.makeMixedDataFrame().to_hdf('foo.h5', 'df', mode='w', format='table')

In [2]: pd.read_hdf('foo.h5')
Out[2]: 
     A    B     C          D
0  0.0  0.0  foo1 2009-01-01
1  1.0  1.0  foo2 2009-01-02
2  2.0  0.0  foo3 2009-01-05
3  3.0  1.0  foo4 2009-01-06
4  4.0  0.0  foo5 2009-01-07

In [3]: it = pd.read_hdf('foo.h5', chunksize=1)

In [4]: next(it)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-2cdb14c0d4d6> in <module>()
----> 1 next(it)

TypeError: 'TableIterator' object is not an iterator

Pandas "'TableIterator' object is not an iterator" when calling df.read_hdf()

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`