When using JVM Checkpoint/Restore with Spring Boot, I can see log messages like:
2023-08-23T06:53:41.292Z INFO 405 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Restarting Spring-managed lifecycle beans after JVM restore
2023-08-23T06:53:41.294Z INFO 405 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Spring-managed lifecycle restart completed in 2 ms
2023-08-23T06:53:41.340Z INFO 405 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path ''
2023-08-23T06:53:41.353Z INFO 405 --- [ main] o.s.s.petclinic.PetClinicApplication : Started PetClinicApplication in 330.329 seconds (process running for 330.592)
2 observations: - The very long delay seems to be computed leveraging the application startup during the checkpoint as a basis, so the delay printed can be huge since the application can be restored much later - Spring Framework already prints a CRaC related metric which is the time taken by the application context to restore
I guess we see this behavior due to the fact that CRaC support in Spring Framework is stopping and restarting the application context lifecycle and the related listener is still active.
I am wondering if it could be possible to either: - Provide a log message with the delay to restore the application (we may not have a relevant information for the process, but maybe for the JVM part we have) - If we can't find a way to print a relevant metric, do not display this confusing log message by making sure it is only printed one time at startup.
The first proposal would be my preferred option as it would allow use to compare cold startup and restoration time.
Comment From: wilkinsona
I guess we see this behavior due to the fact that CRaC support in Spring Framework is stopping and restarting the application context lifecycle and the related listener is still active.
I don't think that's the cause as there's no listener involved here. The code's all in SpringApplication.run:
https://github.com/spring-projects/spring-boot/blob/3a99066eebe3d9c29f3d153402ca68815496609b/spring-boot-project/spring-boot/src/main/java/org/springframework/boot/SpringApplication.java#L299-L317
I believe what's happening is:
startTimeis set toSystem.nanoTime()- The context is refreshed which causes the checkpoint to be taken
- The checkpoint is restored some time later and processing continues
timeTakenToStartupis determined based on the originalstartTime- The startup info is logged
To behave differently, I think we'd need SpringApplication to be become aware of CRaC and to somehow determine that this was a restore rather than a normal startup.
Comment From: sdeleuze
This is not yet exposed via the org.crac API, but shortly this CRaCMXBean API should be available and could allow:
- To know if there as been a restoration by testing if CRaCMXBean.getRestoreTime() == -1 (protected by a classpath check)
- Get an honest estimation of the time to restore the application, and display that value in Spring Boot instead of the startup time when the application is restored.
If that's considered as a good fit by the Boot team, I can comment here as soon as it is available via the CRaC dependency.
Comment From: sdeleuze
This API, documented here, is now available as part of org.crac:crac:1.4.0 which Spring Framework now leverages so Spring Boot should be able to leverage it.
We can potentially add related checks in https://github.com/spring-projects/spring-checkpoint-restore-smoke-tests once there is a branch of Boot that leverages it.