Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

Put this in `tmp.pyx`:

# In Cython code - any use of `_libs.khash` will trigger this
from pandas._libs.khash cimport kh_int64_t

Then run cython tmp.pyx. That will result in:

Error compiling Cython file:
------------------------------------------------------------
...
    bint kh_exist_strbox(kh_strbox_t*, khiter_t) nogil

    khuint_t kh_needed_n_buckets(khuint_t element_n) nogil


include "khash_for_primitive_helper.pxi"
^
------------------------------------------------------------

/home/rgommers/mambaforge/envs/pandas-dev/lib/python3.8/site-packages/pandas/_libs/khash.pxd:129:0: 'khash_for_primitive_helper.pxi' not found

Issue Description

I found this when following up on https://github.com/pandas-dev/pandas/pull/49115#issuecomment-1452662214:

Cython.Compiler.Errors.InternalError: Internal compiler error: 'khash_for_primitive_helper.pxi' not found

There are a couple of related issues that interact here:

  1. pandas is shipping lots of files in wheels that should not be there. In particular, .pxd and .pyx files in pandas/_libs.
  2. Use of absolute cimport's which should probably be relative
  3. Use of include <name>.pxi" in .pxd files. This should be replaced by shared declarations in a common .pxd file (see the warning in http://docs.cython.org/en/latest/src/userguide/language_basics.html#the-include-statement-and-include-files)

For (1), if you download any pandas 1.5.3 wheel, you'll see in pandas/_libs:

khash.pxd
khash_for_primitive_helper.pxi.in

And, notably, khash.pxd contains include "khash_for_primitive_helper.pxi" - and that file is not present (only the pxi.in template is). So basically a broken private .pxd here. Which is then picked up during the build in gh-49115 because of absolute from pandas._libs.khash cimport ... statements inside pandas itself.

That particular issue probably shows up in the Meson build but not during the setup.py-based build because in the latter the .pxi file is generated in-place rather than in the build dir. However, as my reproducer above shows, this is a bit of a house of cards, because the absolute from pandas._libs imports are actually broken.

Expected Behavior

Expected is that the .pxds aren't shipped, so anyone trying to access private .pxd files will get a clear exception. This will be automatically fixed when the Meson build is merged. However, that still leaves potential issues in any environments that already have pandas installed.

My suggestion is to: - Use relative cimports for accessing anything within pandas (needs testing, because Cython's cimport mechanism is very fragile all around). - Get rid of the .pxi.in and replace it with the recommended .pxd method.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.8.16.final.0 python-bits : 64 OS : Linux OS-release : 6.2.1-arch1-1 Version : #1 SMP PREEMPT_DYNAMIC Sun, 26 Feb 2023 03:39:23 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.3 numpy : 1.23.5 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 67.4.0 pip : 23.0.1 Cython : 0.29.33 pytest : 7.2.1 hypothesis : 6.68.2 sphinx : 4.5.0 blosc : None feather : None xlsxwriter : 3.0.8 lxml.etree : 4.9.2 html5lib : 1.1 pymysql : 1.0.2 psycopg2 : 2.9.3 jinja2 : 3.1.2 IPython : 8.11.0 pandas_datareader: None bs4 : 4.11.2 bottleneck : 1.3.6 brotli : fastparquet : 2023.2.0 fsspec : 2023.1.0 gcsfs : 2023.1.0 matplotlib : 3.6.3 numba : 0.56.4 numexpr : 2.8.3 odfpy : None openpyxl : 3.1.0 pandas_gbq : None pyarrow : 11.0.0 pyreadstat : 1.2.1 pyxlsb : 1.0.10 s3fs : 2023.1.0 scipy : 1.10.1 snappy : sqlalchemy : 2.0.4 tables : 3.7.0 tabulate : 0.9.0 xarray : 2023.1.0 xlrd : 2.0.1 xlwt : None zstandard : 0.19.0 tzdata : None