Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

def test_pandas_parquet_writer_bug():
        import pandas as pd
        from fs import open_fs
        df = pd.DataFrame(data={"A": [0, 1], "B": [1, 0]})
        with open_fs('osfs://./').open('.test.parquet', 'wb') as f:
            print(f.name)
            df.to_parquet(f)

Issue Description

Running the example above with Pandas 1.5.0 results in the exception below, but works fine in 1.4.3.

  File ".venv\lib\site-packages\pandas\io\parquet.py", line 202, in write
    self.api.parquet.write_table(
  File ".venv\lib\site-packages\pyarrow\parquet.py", line 1970, in write_table
    with ParquetWriter(
  File ".venv\lib\site-packages\pyarrow\parquet.py", line 655, in __init__
    self.writer = _parquet.ParquetWriter(
  File "pyarrow\_parquet.pyx", line 1387, in pyarrow._parquet.ParquetWriter.__cinit__
  File "pyarrow\io.pxi", line 1579, in pyarrow.lib.get_writer
TypeError: Unable to read from object of type: <class 'bytes'>

This seems to be because of this new block added to io.parquet.py:

 if (
        isinstance(path_or_handle, io.BufferedWriter)
        and hasattr(path_or_handle, "name")
        and isinstance(path_or_handle.name, (str, bytes))
    ):
    path_or_handle =  path_or_handle.name

Which assumes that path_or_handle.name will always be of type str, but is actually bytes in this example.

A solution that works for me is to simply append this below the block:

if isinstance(path_or_handle, bytes):
    path_or_handle = path_or_handle.decode("utf-8")

Alternatively, just passing thr BufferedWriter through to the subsequent self.api.parquet.write_to_dataset call works as well, but I'm not sure of the implications of reverting to this old behaviour (which I why I haven't submitted a pull request, but happy to do so if someone can confirm).

Cheers, Sam.

Expected Behavior

No exception is thrown

Installed Versions

INSTALLED VERSIONS ------------------ commit : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763 python : 3.9.13.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_Australia.1252 pandas : 1.5.0 numpy : 1.23.3 pytz : 2022.4 dateutil : 2.8.2 setuptools : 59.8.0 pip : 22.2.2 Cython : None pytest : 6.2.5 hypothesis : None sphinx : 4.5.0 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.9.3 jinja2 : 3.1.2 IPython : None pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : 2022.8.2 gcsfs : None matplotlib : 3.6.0 numba : 0.56.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 5.0.0 pyreadstat : None pyxlsb : None s3fs : 2022.8.2 scipy : 1.9.1 snappy : None sqlalchemy : 1.4.41 tables : None tabulate : 0.8.10 xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: phofl

commit a15ca9755f8e43f9208771f72794e7dd01af92ff
Author: Chandrasekaran Anirudh Bhardwaj <anirudhsekar96@gmail.com>
Date:   Sat Jan 22 16:21:22 2022 -0800

    closes #44914 by changing the path object to string if it is of io.BufferWriter (#45480)

Comment From: phofl

cc @Anirudhsekar96

Comment From: twoertwein

Do we really want to allow paths in bytes? I think users will then also expect to use bytes paths for all other IO functions (including some which accept the bytes content - not a bytes path)

Comment From: phofl

No strong opinion, could deprecate, but silent behavior change is not great

Comment From: SamRWest

A BufferedWriter with a name property of bytes seems to be valid - eg: open(b'./test.parquet', 'rb') works fine. And if PyFileSystem2 handles are of this format, and the previous pandas version supported them, it'd be good to continue supporting this.

Comment From: Anirudhsekar96

@SamRWest Thanks for checking! Since the docs mentioned str like path hence I did not check for bytes like object with name attribute in my commit.

The docs indicate that FS like object is also an expected input hence a PyFileSystem2 handle as input would also be expected behavior, as @phofl mentioned if past versions supported a PyFileSystem2 handle then we should not allow silent behavior change and allow for bytes like object with name attribute as the path.

Comment From: MarcoGorelli

Hey @SamRWest

Thanks for having reported this - there's now a release candidate for pandas 2.0.0, which you can install with

mamba install -c conda-forge/label/pandas_rc pandas==2.0.0rc0

or

python -m pip install --upgrade --pre pandas==2.0.0rc0

If you try it out and report bugs, then we can fix them before the final 2.0.0 release