A disproportionate amount of our CI failures are in s3 tests e.g. test_with_s3_url. Moreover we have 2 xfailed tests with strict=False: test_write_s3_parquet_fails and test_write_s3_csv_fails xref #39155 where AFAICT we have no idea how to fix them.

This suggests we might not be the right team to maintain this feature. Is there a better home for this?

(The other big pain point CI-wise is the test_user_agent tests which cause all the timeouts)

Comment From: twoertwein

Pandas uses fsspec to support s3 and all other kinds of protocols. Instead of testing s3 we could use any(?) other fsspec protocol that seems more reliable.

Comment From: jreback

this sounds like a great idea

IIRC we have a lot of these tests prior to using fssc so certainly can use a non network protocol

Comment From: mroeschke

I think the specific reason why those s3 tests fail is because the s3_resource fixture populates and deletes files from "s3" (moto) after each test, and in a muti-process CI environment this leads to flaky race condition errors. Running these tests in a single process environment would probably fix them: https://github.com/pandas-dev/pandas/issues/44584

That being said, +1 to just using fsspec instead of s3f3 & gcsfs

Comment From: twoertwein

If #44584 doesn't work out:

Migrating frequently failing s3 tests to other fsspec protocols sounds like a good idea to me. I think we shouldn't remove all s3/gcfs tests:

  • There seems to be a bit of s3-specific code: https://github.com/pandas-dev/pandas/blob/e08ffd4a5109c41a38cd89fb6965485538c241aa/pandas/io/common.py#L360
  • And it might be good to keep at least four s3 tests: read_csv/to_csv both w/wo compression to make sure that text/binary mode is correctly inferred.

Comment From: jbrockmendel

No momentum for spin-off, closing.