A disproportionate amount of our CI failures are in s3 tests e.g. test_with_s3_url. Moreover we have 2 xfailed tests with strict=False
: test_write_s3_parquet_fails
and test_write_s3_csv_fails
xref #39155 where AFAICT we have no idea how to fix them.
This suggests we might not be the right team to maintain this feature. Is there a better home for this?
(The other big pain point CI-wise is the test_user_agent tests which cause all the timeouts)
Comment From: twoertwein
Pandas uses fsspec
to support s3 and all other kinds of protocols. Instead of testing s3 we could use any(?) other fsspec protocol that seems more reliable.
Comment From: jreback
this sounds like a great idea
IIRC we have a lot of these tests prior to using fssc so certainly can use a non network protocol
Comment From: mroeschke
I think the specific reason why those s3 tests fail is because the s3_resource
fixture populates and deletes files from "s3" (moto) after each test, and in a muti-process CI environment this leads to flaky race condition errors. Running these tests in a single process environment would probably fix them: https://github.com/pandas-dev/pandas/issues/44584
That being said, +1 to just using fsspec
instead of s3f3
& gcsfs
Comment From: twoertwein
If #44584 doesn't work out:
Migrating frequently failing s3 tests to other fsspec
protocols sounds like a good idea to me. I think we shouldn't remove all s3/gcfs tests:
- There seems to be a bit of s3-specific code: https://github.com/pandas-dev/pandas/blob/e08ffd4a5109c41a38cd89fb6965485538c241aa/pandas/io/common.py#L360
- And it might be good to keep at least four s3 tests: read_csv/to_csv both w/wo compression to make sure that text/binary mode is correctly inferred.
Comment From: jbrockmendel
No momentum for spin-off, closing.