Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pyarrow as pa

data = {
    "foo": ["A", "B", "C"],
    "prop": [0.5, 0.8, 0.7]
}

df = pd.DataFrame(data)

df["prop"].astype('float64[pyarrow]')

Issue Description

Cannot change a column type to float64[pyarrow]. The error I get is:

NameError: name 'pa' is not defined

Expected Behavior

I'd expect to get the column converted properly

0    0.5
1    0.8
2    0.7
Name: prop, dtype: double[pyarrow]

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.12.8 python-bits : 64 OS : Darwin OS-release : 24.1.0 Version : Darwin Kernel Version 24.1.0: Thu Oct 10 21:02:45 PDT 2024; root:xnu-11215.41.3~2/RELEASE_ARM64_T8112 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8 pandas : 2.2.3 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 pip : 24.2 Cython : None sphinx : None IPython : 8.27.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : 1.4.2 dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.4 lxml.etree : None matplotlib : 3.9.2 numba : None numexpr : 2.10.1 odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 17.0.0 pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None

Comment From: emmansh

Seems related to #57928.

Comment From: rhshadrach

Thanks for the report, cannot reproduce on main nor 2.2.x. Just prior to the astype line, can you try running:

import sys
print([e for e in sys.modules if "pyarrow" in e])

what is the output that you get?

Comment From: emmansh

@rhshadrach –

import sys
print([e for e in sys.modules if "pyarrow" in e])

## ['pandas.compat.pyarrow', 'pyarrow._generated_version', 'pyarrow.util', 'pyarrow.lib', 'pyarrow.ipc', 'pyarrow.types', 'pyarrow']

It's also worth noting that this issue is replicated on two different machines (locally as above; and AWS Sagemaker notebook).

More info I set up my virtual env this way:

conda create -n "foo" python=3.12

source activate base
source activate foo

conda install jupyterlab pandas seaborn matplotlib pyarrow

πŸ‘†πŸ»And working inside foo virtual env results with the issue reported in my original post.

However if I set the virtual env in the following way, no issue and works as expected:

conda create -n "bar" python=3.12

source activate base
source activate bar

conda install jupyterlab pandas=2.2.2 seaborn matplotlib pyarrow=17.0.0

Comment From: emmansh

Hey @rhshadrach – just checking in. Have you tried replicating the bug using the virtual envs I showed in my last comment?

Comment From: rhshadrach

@emmansh - no, and I don't have the bandwidth to work on this issue any further.

Comment From: zogzog

It happens only when Pyarrow is not installed.

Comment From: emmansh

@zogzog – as far as I understand the pandas code I posted originally should run out of the box. No?

Comment From: zogzog

@zogzog – as far as I understand the pandas code I posted originally should run out of the box. No?

It's the most troubling aspect of your report (you can "import pyarrow as pa" ... so, yes it should !).

My own experience is that, a few days ago I wanted to try pandas series with pyarrow, I hadn't installed pyarrow yet, and I had the NameError - which then makes sense.

Comment From: Win7GM

@rhshadrach –

import sys print([e for e in sys.modules if "pyarrow" in e])

['pandas.compat.pyarrow', 'pyarrow._generated_version', 'pyarrow.util', 'pyarrow.lib', 'pyarrow.ipc', 'pyarrow.types', 'pyarrow']

It's also worth noting that this issue is replicated on two different machines (locally as above; and AWS Sagemaker notebook).

More info I set up my virtual env this way:

conda create -n "foo" python=3.12

source activate base source activate foo

conda install jupyterlab pandas seaborn matplotlib pyarrow πŸ‘†πŸ»And working inside foo virtual env results with the issue reported in my original post.

However if I set the virtual env in the following way, no issue and works as expected:

conda create -n "bar" python=3.12

source activate base source activate bar

conda install jupyterlab pandas=2.2.2 seaborn matplotlib pyarrow=17.0.0

I tried to use pyarrow functionalities and I forgot to install pyarrow at first, so I did get the error NameError: name 'pa' is not defined

However after I installed pyarrow I still get the previous error... but the output of

import sys
print([e for e in sys.modules if "pyarrow" in e])

is pretty much the same like yours

I was working in a jupyter notebook and rebooting the kernel made it work. I'm guessing some weird import issue of already ran cells.

DK if this is ur case tho

Comment From: Anurag-Varma

@emmansh Can you uninstall pyarrow and then reinstall the latest version

I had a similar issue of pyarrow version issue and the above fixed for me.