xref #43611, #43643

When trying to figure out azure timeout issues, deadlock appeared to be occurring in parser code, so pyarrow makes sense as the culprit. Seems like tests with weird input cause issues, for example some of the parse_dates tests, or for a specific reproducer the test:

pandas/tests/io/parser/common/test_ints.py::test_outside_int64_uint64_range

On current pyarrow I can't reproduce, but azure uses 0.17.0, with which can reproduce a deadlock (just running the command pandas/tests/io/parser/common/test_ints.py::test_outside_int64_uint64_range) on macOS. Doesn't happen consistently, but will deadlock (to the point that need to sigkill to stop, which explains why pytest-timeout didn't catch it).

cc @lithomas1 if any thoughts here

Comment From: jorisvandenbossche

We can increase the minimum required version of pyarrow for the CSV reading functionality?

It might also be good to make a reproducible example to report to Arrow. Although I suppose it is fixed now (given it only happens on older pyarrow versions), it can still be useful to add it as a test over there.

Comment From: mzeitlin11

Makes sense - from initial testing I know the issue is at least present for <= 1.0.0. Will try to find minimum working version and report to pyarrow if some previously failing cases don't look tested (but someone is welcome to beat me to it :)

Comment From: lithomas1

I'll try to test this some more. It is also possible that pyarrow is getting stuck because of the TextIoWrapper that we are using on our side to force pyarrow to read StringIO's, which would be a bug on our side.

Comment From: jbrockmendel

@lithomas1 @mzeitlin11 did this ever get sorted out?

Comment From: mroeschke

Now that our minimum version is 6.0 I believe we shouldn't hit this issue anyone as IIRC I was experiencing this with pyarrow 2.0 and had skipped those version in the CI due to the deadlock.

Closing since we haven't seen this in a while but we can reopen if this shows up again