After updating https://github.com/spring-projects/spring-lifecycle-smoke-tests to run tests against Spring Boot 3.4.x, I have noticed that framework:webflux-undertow:checkpointRestoreAppTest is broken with Boot 3.4.x while still green with Boot 3.3.x, even if both are using the same Undertow version with the following error:

Error (criu/libnetlink.c:54): -95 reported by netlink: Operation not supported
Error (criu/net.c:3744): Unable to create a veth pair: -95

While discussing with @snicoll about what could caused that, he mentioned that Spring Boot 3.4.x enables graceful shutdown by default, so I tried server.shutdown=immediate and found that it fixes the test.

Could the Spring Boot team see if we could avoid this regression and keep WebFlux + Undertow CRaC support working out of the box? I suspect that when graceful shutdown is enabled, it is not finished when JVM checkpoint is invoked, letting the socket in a bad state, hence the error above.

Comment From: wilkinsona

This doesn't look like a regression to me as it also fails (although perhaps differently) with Boot 3.3.x when graceful shutdown is enabled:

> Task :framework:webflux-undertow:checkpointRestoreAppTest FAILED

WebfluxApplicationTests > stringResponseBody(WebTestClient) STANDARD_OUT
    09:43:18.371 [Test worker] ERROR org.springframework.test.web.reactive.server.ExchangeResult -- Request details for assertion failure:

    > GET http://localhost:38021
    > accept-encoding: [gzip]
    > user-agent: [ReactorNetty/1.1.25]
    > host: [localhost:38021]
    > accept: [*/*]
    > WebTestClient-Request-Id: [1]

    No content

    < 503 SERVICE_UNAVAILABLE Service Unavailable
    < Connection: [keep-alive]
    < Content-Length: [0]
    < Date: [Fri, 03 Jan 2025 09:43:18 GMT]

    0 bytes of content (unknown content-type).


WebfluxApplicationTests > stringResponseBody(WebTestClient) FAILED
    java.lang.AssertionError at WebfluxApplicationTests.java:18

WebfluxApplicationTests > resourceInStatic(WebTestClient) STANDARD_OUT
    09:43:18.401 [Test worker] ERROR org.springframework.test.web.reactive.server.ExchangeResult -- Request details for assertion failure:

    > GET http://localhost:38021/foo.html
    > accept-encoding: [gzip]
    > user-agent: [ReactorNetty/1.1.25]
    > host: [localhost:38021]
    > accept: [*/*]
    > WebTestClient-Request-Id: [2]

    No content

    < 503 SERVICE_UNAVAILABLE Service Unavailable
    < Connection: [keep-alive]
    < Content-Length: [0]
    < Date: [Fri, 03 Jan 2025 09:43:18 GMT]

    0 bytes of content (unknown content-type).

Comment From: wilkinsona

With Boot 3.4.1, I'm seeing the same behavior as Boot 3.3.x when graceful shutdown is enabled. The checkpoint works, the app starts successfully upon restore, and then rejects requests with a 503. This happens because Undertow's GracefulShutdownHandler is only single-use. Once it has been shut down (as happens when taking the checkpoint) the shutdown bit is set in its state field. The bit isn't cleared upon restore so the handler still believes that Undertow has been shut down. There's no API to clear it so we may have to resort to reflection if this is something that we want to support. Alternatively, it might be possible to ignore the handler somehow when taking a checkpoint so that it isn't shut down.

Comment From: sdeleuze

For the automatic checkpoint/restore at startup use case where -Dspring.context.checkpoint=onRefresh is set, graceful shutdown is IMO not needed (for any webserver) since no request is expected to have been received, so if you can disable it (for Undertow or all servers) for that use case specifically, that would make sense. Spring Boot can leverage DefaultLifecycleProcessor#CHECKPOINT_PROPERTY_NAME and DefaultLifecycleProcessor#ON_REFRESH_VALUE.

For the on-demand checkpoint/restore of a running application, I think graceful shutdown makes more sense, so maybe I could create a related GracefulShutdownHandler feature request on Undertow bug tracker and for now we just document in https://github.com/spring-projects/spring-lifecycle-smoke-tests that people using Undertow + CRaC + on-demand checkpoint/restore should disable graceful shutdown?