Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import pyarrow as pa
data = {
"foo": ["A", "B", "C"],
"prop": [0.5, 0.8, 0.7]
}
df = pd.DataFrame(data)
df["prop"].astype('float64[pyarrow]')
Issue Description
Cannot change a column type to float64[pyarrow]
. The error I get is:
NameError: name 'pa' is not defined
Expected Behavior
I'd expect to get the column converted properly
0 0.5
1 0.8
2 0.7
Name: prop, dtype: double[pyarrow]
Installed Versions
Comment From: emmansh
Seems related to #57928.
Comment From: rhshadrach
Thanks for the report, cannot reproduce on main nor 2.2.x. Just prior to the astype
line, can you try running:
import sys
print([e for e in sys.modules if "pyarrow" in e])
what is the output that you get?
Comment From: emmansh
@rhshadrach β
import sys
print([e for e in sys.modules if "pyarrow" in e])
## ['pandas.compat.pyarrow', 'pyarrow._generated_version', 'pyarrow.util', 'pyarrow.lib', 'pyarrow.ipc', 'pyarrow.types', 'pyarrow']
It's also worth noting that this issue is replicated on two different machines (locally as above; and AWS Sagemaker notebook).
More info I set up my virtual env this way:
conda create -n "foo" python=3.12
source activate base
source activate foo
conda install jupyterlab pandas seaborn matplotlib pyarrow
ππ»And working inside foo
virtual env results with the issue reported in my original post.
However if I set the virtual env in the following way, no issue and works as expected:
conda create -n "bar" python=3.12
source activate base
source activate bar
conda install jupyterlab pandas=2.2.2 seaborn matplotlib pyarrow=17.0.0
Comment From: emmansh
Hey @rhshadrach β just checking in. Have you tried replicating the bug using the virtual envs I showed in my last comment?
Comment From: rhshadrach
@emmansh - no, and I don't have the bandwidth to work on this issue any further.
Comment From: zogzog
It happens only when Pyarrow is not installed.
Comment From: emmansh
@zogzog β as far as I understand the pandas code I posted originally should run out of the box. No?
Comment From: zogzog
@zogzog β as far as I understand the pandas code I posted originally should run out of the box. No?
It's the most troubling aspect of your report (you can "import pyarrow as pa" ... so, yes it should !).
My own experience is that, a few days ago I wanted to try pandas series with pyarrow, I hadn't installed pyarrow yet, and I had the NameError - which then makes sense.
Comment From: Win7GM
@rhshadrach β
import sys print([e for e in sys.modules if "pyarrow" in e])
['pandas.compat.pyarrow', 'pyarrow._generated_version', 'pyarrow.util', 'pyarrow.lib', 'pyarrow.ipc', 'pyarrow.types', 'pyarrow']
It's also worth noting that this issue is replicated on two different machines (locally as above; and AWS Sagemaker notebook).
More info I set up my virtual env this way:
conda create -n "foo" python=3.12
source activate base source activate foo
conda install jupyterlab pandas seaborn matplotlib pyarrow ππ»And working inside
foo
virtual env results with the issue reported in my original post.However if I set the virtual env in the following way, no issue and works as expected:
conda create -n "bar" python=3.12
source activate base source activate bar
conda install jupyterlab pandas=2.2.2 seaborn matplotlib pyarrow=17.0.0
I tried to use pyarrow functionalities and I forgot to install pyarrow at first, so I did get the error NameError: name 'pa' is not defined
However after I installed pyarrow I still get the previous error... but the output of
import sys
print([e for e in sys.modules if "pyarrow" in e])
is pretty much the same like yours
I was working in a jupyter notebook and rebooting the kernel made it work. I'm guessing some weird import issue of already ran cells.
DK if this is ur case tho
Comment From: Anurag-Varma
@emmansh Can you uninstall pyarrow
and then reinstall the latest version
I had a similar issue of pyarrow version issue and the above fixed for me.