SpringBoot Investigate flaky tests - Nineya|java/go/python

Hi,

both locally and on CI I encounter relatively frequent flaky tests. While I see most of the time that they're flaky and ignore them, it happens often enough that I spent/waste time on identifying if those failures are caused by my changes.

Probably not a complete list of things, but things I noticed lately:

ReactiveElasticsearchRepositoriesAutoConfigurationTests > doesNotTriggerDefaultRepositoryDetectionIfCustomized()
ReactiveElasticsearchRepositoriesAutoConfigurationTests > testDefaultRepositoryConfiguration()
DataCassandraTestIntegrationTests > didNotInjectExampleService()
Jetty10ServletWebServerFactoryTests > whenServerIsShuttingDownGracefullyThenResponseToRequestOnIdleConnectionWillHaveAConnectionCloseHeader()
CouchbaseAutoConfigurationIntegrationTests > defaultConfiguration()

Subjectively, the JDK 15 pipeline is a bit flakier, but that might be a false lead.

Anyhow - I wonder if we can do anything about those. I remember that you did an awesome job of increasing timeouts here and there already and tweaked the testcontainer startup attempts, but I think we're past the testcontainers stage in most of the cases mentioned above.

Cheers, Christoph

Comment From: dreis2211

Maybe a stupid question, but is there the possibility to "grep" over the failed build scans on ge.spring.io to get a more complete list of flaky tests?

Comment From: wilkinsona

There is indeed and it's really useful. Here are all the test failures over the last 7 days sorted with flaky tests first.

I thought I'd stabilized the Cassandra tests with a timeout increase, but one failed again today with a 10s timeout.

Comment From: dreis2211

Oh, that is a lovely - I was hoping that Gradle Enterprise had such a feature. Thanks for sharing that view, @wilkinsona.

From a gut feeling - and this might be wrong - most of the failures are related to some sort of timeouts, right? I wonder if the parallelism - as much as it helps - creates some more pressure on the system overall that leads to more timeouts. Given that you did an amazing job of tweaking the task caches, is this maybe something to play around with?

Comment From: wilkinsona

I think another common theme among the flaky tests is that many of them use Docker. Of the five listed above, four of them use Docker and I think parallelism could be part of the cause.

When I was working on the build migration, allowing Gradle to create one worker per core made things really unstable with many Docker-related failures. One worker per two cores seems to work well on our development machines at least. My MacBook Pro has 16 cores so I have the following in ~/.gradle/gradle.properties:

org.gradle.workers.max=8

We configure the max workers to 4 on CI as they have, IIRC, 8 cores. We could try tuning this down, but I'd prefer not to slow everything down to avoid a problem that's at least somewhat Docker specific. I'm tempted to go through another round of timeout increases and see how it goes.

Comment From: dreis2211

The Docker theme reminded me of something. I wonder if it would help to use the newer versions of the respective container images as well.

I saw that for almost every image there are newer (patch) versions available. (There are also newer major and minor versions available here and there, but that might be too aggressive)

Image	Current	Latest
Cassandra	3.11.2	3.11.10
Mongo	4.0.10	4.0.23
Redis	4.0.6	4.0.14

Neo4J and Couchbase should be already on the latest patch versions.

Let me know if I should give this a test.

Comment From: wilkinsona

This is probably a better test failures link. It adds the CI tag so it filters out failures on our development machines where things may be failing as we're iterating on a new feature.

Comment From: wilkinsona

Yes please, @dreis2211. Upgrading those 3 sounds like a good idea to me.

Comment From: dreis2211

I also noticed that apparently the libraries in spring-boot-parent didn't get a bomr run lately. There is a testcontainers update to 1.15.2. Let me know if I should create a PR for the update or if you want to run bomr.

Comment From: wilkinsona

I'll run Bomr on all three maintained branches.

Comment From: wilkinsona

I've made a couple of changes today related to flaky tests:

https://github.com/spring-projects/spring-boot/issues/25518
https://github.com/spring-projects/spring-boot/issues/25520

Comment From: wilkinsona

Things seem to have settled down quite a bit recently so I'll close this one now. We can take a look again in the future of we start noticing a rise in flakiness again.

Comment From: snicoll

CouchbaseAutoConfigurationIntegrationTests is flaky again. I've seen it fail several times in the recent past. Reopening to look at it again.

Comment From: snicoll

@daschl suggests that we enable debug logging for com.couchbase. That'll help identify why the bucket isn't ready.

Comment From: snicoll

@daschl also suggested the upgrade to the latest couchbase driver can help. I haven't seen one flaky test since the upgrade so I am gong to close this one again.