Affects: Spring Boot 3.3.0, but I think every version supporting CRaC is affected
Consider the following simple application:
@SpringBootApplication
@EnableScheduling
class MyApp
fun main(args: Array<String>) {
runApplication<MyApp>(*args)
}
@RestController
class SchedulingController {
val data = AtomicInteger(0)
@Scheduled(timeUnit = TimeUnit.SECONDS, fixedRate = 1L)
fun increment(){
println(data.incrementAndGet())
}
@GetMapping("/")
fun data() = data.get()
}
My actions are following
./gradlew build
- Build with the following Dockerfile (
docker build -t last_edit_pre .
):
FROM bellsoft/liberica-runtime-container:jdk-crac-slim
ADD build/libs/last_edit-0.0.1-SNAPSHOT.jar /app/app.jar
WORKDIR /app
ENTRYPOINT java -XX:CRaCCheckpointTo=/app/checkpoint -jar /app/app.jar
- Run it with
docker run --privileged -p 8081:8080 -it --name last_edit_pre last_edit_pre:latest
and wait for some time (for example, until count 10) - Create a snapshot with
docker exec -it last_edit_pre jcmd 129 JDK.checkpoint
- Commit the snapshot to new image
docker commit last_edit_pre last_edit_post
- Run the newly-created image like this
docker run -it --rm --entrypoint java last_edit_post:latest -XX:CRaCRestoreFrom=/app/checkpoint
Here I observe an interesting behavior: Counter very quickly rewinds from the checkpoint moment to current time. The later I restore from the snapshot the more iterations it quickly rewinds.
It is potentially dangerous: if the scheduled operation is CPU-intensive of performs a dangerous operation - it can actually crush the application with all range of causes.
I do realize that sometimes this behavior might be required, in this case it should probably be an application property.
Comment From: sdeleuze
@asm0dey So please find below our findings and proposal.
First, be aware that only the x86 variant of bellsoft/liberica-runtime-container:jdk-crac-slim
is available, so I used on my Mac M2 a modified version of https://github.com/sdeleuze/spring-boot-crac-demo to reproduce.
Second, the behavior you report is only visible with the on-demand checkpoint/restore of a running application mode, not with the automatic checkpoint/restore at startup one.
Third, if we take a step back, the behavior we see kind of makes sense given the fact that fixedRate
behavior is described as "execute the annotated method with a fixed period between invocations", with the first invocations being perfomed before the checkpoint. Interesting, fixedDelay
works without such side effect if you want a behavior where a CRaC restoration is similar to just a faster startup as its definition is "execute the annotated method with a fixed period between the end of the last invocation and the start of the next". Notice also that cron
works also as you would expect here as cron expressions are calculated after every task execution as well.
As you mention it yourself, sometimes current behavior might be required, sometimes not, so I don't think we should change the default behavior. And since fixedDelay
and cron
works as expected with CRaC if you want a behavior where a CRaC restoration is similar to just a faster startup, I think I would suggest to turn this issue into a documentation one that would add a sheduling section in the Spring CRaC refdoc to warn about this side effect of on-demand checkpoint when fixedRate
is used, and recommending using fixedDelay
and cron
instead for that use case. Would that be ok from your POV?
Comment From: asm0dey
@sdeleuze thank you for looking into it! Now, when you explained the intricacies of the behavior it makes a perfect sense! And I now when I understand the behavior I totally agree that it's just a matter of documentation.