Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
pandas.read_csv("", sep="\s+", engine="pyarrow")

Issue Description

This fails with the following error:

ValueError: the 'pyarrow' engine does not support regex separators
(separators > 1 char and different from '\s+' are interpreted as regex)

Expected Behavior

I'm not sure if pyarrow is meant to support \s+. If pyarrow supports it, then this should not fail. If pyarrow does not support it, then I believe the error should be modified to reflect this, since it now seems to imply that \s+ is not interpreted as a regex, so pyarrow should support it.

Update: I looked in the main branch and it seems that pyarrow does not to support \s+, so changing the error message should be enough.

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 478d340667831908b5b4bf09a2787a11a14560c9
python           : 3.11.0.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.15.0-69-generic
Version          : #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 2.0.0
numpy            : 1.24.2
pytz             : 2023.2
dateutil         : 2.8.2
setuptools       : 67.6.0
pip              : 23.0.1
Cython           : None
pytest           : 7.2.2
hypothesis       : None
sphinx           : 6.1.3
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.1.2
IPython          : 8.12.0
pandas_datareader: None
bs4              : None
bottleneck       : None
brotli           : 
fastparquet      : None
fsspec           : None
gcsfs            : None
matplotlib       : None
numba            : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 11.0.0
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : None
snappy           : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
zstandard        : None
tzdata           : 2023.3
qtpy             : None
pyqt5            : None

Comment From: lithomas1

Thanks for the report. This is probably due to erroneously sharing that line of code with the C parser.

If you're interested in making a PR, We can probably add a check for pyarrow inside the elif block here. https://github.com/pandas-dev/pandas/blob/e9e034badbd2e36379958b93b611bf666c7ab360/pandas/io/parsers/readers.py#L1514

Otherwise, I'll take care of this in hopefully a week or so.