Code Sample
import numpy as np
import pandas as pd
x = np.random.randn(10,3)
# this works - list of DataFrames without column names
np.ravel([pd.DataFrame(batch.reshape(1,3)) for batch in x])
# this also works - single DataFrame with column names
np.ravel(pd.DataFrame(x[0].reshape(1,3), columns=["x1", "x2", "x3"]))
# this doesn't work - list of DataFrames with column names
np.ravel([pd.DataFrame(batch.reshape(1,3), columns=["x1", "x2", "x3"]) for batch in x])
Problem description
When calling numpy.ravel on a list of DataFrames with column names it gives an error - regular DataFrames aren't an issue though.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-67-b9b3fcdd02a0> in <module>()
----> 1 np.ravel([pd.DataFrame(batch.reshape(1,3), columns=["x1", "x2", "x3"]) for batch in x])
~/anaconda/envs/python36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in ravel(a, order)
1572 return asarray(a).ravel(order=order)
1573 else:
-> 1574 return asanyarray(a).ravel(order=order)
1575
1576
~/anaconda/envs/python36/lib/python3.6/site-packages/numpy/core/numeric.py in asanyarray(a, dtype, order)
551
552 """
--> 553 return array(a, dtype, copy=False, order=order, subok=True)
554
555
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2925 if self.columns.nlevels > 1:
2926 return self._getitem_multilevel(key)
-> 2927 indexer = self.columns.get_loc(key)
2928 if is_integer(indexer):
2929 indexer = [indexer]
~/anaconda/envs/python36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2657 return self._engine.get_loc(key)
2658 except KeyError:
-> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key))
2660 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2661 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
Expected Output
no error
Output of pd.show_versions()
Comment From: gfyoung
cc @jreback - do we actually support such compatibility with numpy
functions?
Comment From: mroeschke
This seems to work on master. Suppose could use a test
In [9]: np.ravel([pd.DataFrame(batch.reshape(1,3), columns=["x1", "x2", "x3"]) for batch in x])
Out[9]:
array([ 0.17742111, 0.87640526, 1.4257005 , 0.05889509, 3.10787888,
-0.312276 , -0.93394565, 2.2401423 , 0.65604378, 0.03366395,
0.59905057, 1.43496667, -1.39746196, -0.28585731, -1.84474429,
-1.3148849 , -0.71611566, -0.57859721, -0.87735003, -0.28434854,
-0.54719655, 1.45308157, 1.04201968, -0.0631709 , -0.38514428,
-1.14143786, -0.66231235, -0.21273731, 0.99846192, 1.27139819])
Comment From: usersblock
Running it on my end gives the exact same error. The version I have is listed below. If there is no PR for this, may I take it as my first issue? Error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-1-507b49754716> in <module>
3
4 x = np.random.randn(10,3)
----> 5 np.ravel([pd.DataFrame(batch.reshape(1,3), columns=["x1", "x2", "x3"]) for batch in x])
~\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in ravel(a, order)
1685 return asarray(a).ravel(order=order)
1686 else:
-> 1687 return asanyarray(a).ravel(order=order)
1688
1689
~\Anaconda3\lib\site-packages\numpy\core\numeric.py in asanyarray(a, dtype, order)
589
590 """
--> 591 return array(a, dtype, copy=False, order=order, subok=True)
592
593
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2978 if self.columns.nlevels > 1:
2979 return self._getitem_multilevel(key)
-> 2980 indexer = self.columns.get_loc(key)
2981 if is_integer(indexer):
2982 indexer = [indexer]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
INSTALLED VERSION
commit : None python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None
pandas : 0.25.1 numpy : 1.16.5 pytz : 2019.3 dateutil : 2.8.0 pip : 19.2.3 setuptools : 41.4.0 Cython : 0.29.13 pytest : 5.2.1 hypothesis : None sphinx : 2.2.0 blosc : None feather : None xlsxwriter : 1.2.1 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.8.0 pandas_datareader: None bs4 : 4.8.0 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 3.1.1 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.3.1 sqlalchemy : 1.3.9 tables : 3.5.2 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.1
Comment From: jreback
@usersblock you are using a really old version
would take a PR for tests
Comment From: usersblock
Updated version and opened PR
Comment From: usersblock
@jreback The compatibility issue still exists on any version that uses Numpy 1.18.5. On the pipeline that would be py38_macos and py38_np18. The other 2 don't have issues.