Code Sample

import numpy as np
import pandas as pd

x = np.random.randn(10,3)

# this works - list of DataFrames without column names
np.ravel([pd.DataFrame(batch.reshape(1,3)) for batch in x])

# this also works - single DataFrame with column names
np.ravel(pd.DataFrame(x[0].reshape(1,3), columns=["x1", "x2", "x3"]))

# this doesn't work - list of DataFrames with column names
np.ravel([pd.DataFrame(batch.reshape(1,3), columns=["x1", "x2", "x3"]) for batch in x])

Problem description

When calling numpy.ravel on a list of DataFrames with column names it gives an error - regular DataFrames aren't an issue though.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-67-b9b3fcdd02a0> in <module>()
----> 1 np.ravel([pd.DataFrame(batch.reshape(1,3), columns=["x1", "x2", "x3"]) for batch in x])

~/anaconda/envs/python36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in ravel(a, order)
   1572         return asarray(a).ravel(order=order)
   1573     else:
-> 1574         return asanyarray(a).ravel(order=order)
   1575 
   1576 

~/anaconda/envs/python36/lib/python3.6/site-packages/numpy/core/numeric.py in asanyarray(a, dtype, order)
    551 
    552     """
--> 553     return array(a, dtype, copy=False, order=order, subok=True)
    554 
    555 

~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2925             if self.columns.nlevels > 1:
   2926                 return self._getitem_multilevel(key)
-> 2927             indexer = self.columns.get_loc(key)
   2928             if is_integer(indexer):
   2929                 indexer = [indexer]

~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2657                 return self._engine.get_loc(key)
   2658             except KeyError:
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2661         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

Expected Output

no error

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8.final.0 python-bits: 64 OS: Darwin OS-release: 18.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.24.2 pytest: 3.3.2 pip: 19.1 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.15.4 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.2.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.4 feather: None matplotlib: 3.0.1 openpyxl: 2.4.10 xlrd: 1.1.0 xlwt: 1.2.0 xlsxwriter: 1.0.2 lxml.etree: 4.1.1 bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: 0.1.6 pandas_gbq: None pandas_datareader: None gcsfs: None

Comment From: gfyoung

cc @jreback - do we actually support such compatibility with numpy functions?

Comment From: mroeschke

This seems to work on master. Suppose could use a test

In [9]: np.ravel([pd.DataFrame(batch.reshape(1,3), columns=["x1", "x2", "x3"]) for batch in x])
Out[9]:
array([ 0.17742111,  0.87640526,  1.4257005 ,  0.05889509,  3.10787888,
       -0.312276  , -0.93394565,  2.2401423 ,  0.65604378,  0.03366395,
        0.59905057,  1.43496667, -1.39746196, -0.28585731, -1.84474429,
       -1.3148849 , -0.71611566, -0.57859721, -0.87735003, -0.28434854,
       -0.54719655,  1.45308157,  1.04201968, -0.0631709 , -0.38514428,
       -1.14143786, -0.66231235, -0.21273731,  0.99846192,  1.27139819])

Comment From: usersblock

Running it on my end gives the exact same error. The version I have is listed below. If there is no PR for this, may I take it as my first issue? Error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-1-507b49754716> in <module>
      3 
      4 x = np.random.randn(10,3)
----> 5 np.ravel([pd.DataFrame(batch.reshape(1,3), columns=["x1", "x2", "x3"]) for batch in x])

~\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in ravel(a, order)
   1685         return asarray(a).ravel(order=order)
   1686     else:
-> 1687         return asanyarray(a).ravel(order=order)
   1688 
   1689 

~\Anaconda3\lib\site-packages\numpy\core\numeric.py in asanyarray(a, dtype, order)
    589 
    590     """
--> 591     return array(a, dtype, copy=False, order=order, subok=True)
    592 
    593 

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2978             if self.columns.nlevels > 1:
   2979                 return self._getitem_multilevel(key)
-> 2980             indexer = self.columns.get_loc(key)
   2981             if is_integer(indexer):
   2982                 indexer = [indexer]

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

INSTALLED VERSION

commit : None python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None

pandas : 0.25.1 numpy : 1.16.5 pytz : 2019.3 dateutil : 2.8.0 pip : 19.2.3 setuptools : 41.4.0 Cython : 0.29.13 pytest : 5.2.1 hypothesis : None sphinx : 2.2.0 blosc : None feather : None xlsxwriter : 1.2.1 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.8.0 pandas_datareader: None bs4 : 4.8.0 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 3.1.1 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.3.1 sqlalchemy : 1.3.9 tables : 3.5.2 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.1

Comment From: jreback

@usersblock you are using a really old version

would take a PR for tests

Comment From: usersblock

Updated version and opened PR

Comment From: usersblock

@jreback The compatibility issue still exists on any version that uses Numpy 1.18.5. On the pipeline that would be py38_macos and py38_np18. The other 2 don't have issues.