Affects: 6.1.10 JDK: zulu21.34.19-ca-crac-jdk21.0.3-linux_x64 Running on a Linux VM
I am trying to use CRaC with a Spring Boot app. I have come across many issues so far, including logback appenders causing jdk.crac.impl.CheckpointOpenFileException
upon checkpoint creation (https://github.com/spring-projects/spring-boot/issues/38548) and the Eureka Discovery Client causing an open connection because of fetching the registry before checkpoint. I was able to workaround those issues so far and I made the checkpointing work.
Now I am stuck on the restore. As you can see in the attached log, the restore code is trying to load my Spring Boot Jar as a class and of course it can't find that. I don't quite understand why it does that.
I've also attached the CRIU dump and restore logs below, they seem fine to me, but I might be wrong.
Spring Boot Log:
24476: Error (criu/tty.c:843): tty: Can't set tty params on 0x26, trying to skip...: Inappropriate ioctl for device
2024-07-17T12:18:51.537Z INFO 24476 --- [app] [ main] o.s.c.support.DefaultLifecycleProcessor : Restarting Spring-managed lifecycle beans after JVM restore
2024-07-17T12:18:51.694Z INFO 24476 --- [app] [ main] o.s.c.support.DefaultLifecycleProcessor : Spring-managed lifecycle restart completed (restored JVM running for 693 ms)
2024-07-17T12:18:51.762Z WARN 24476 --- [app] [ main] ConfigServletWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.context.ApplicationContextException: Failed to restore CRaC checkpoint on refresh
2024-07-17T12:18:51.855Z INFO 24476 --- [app] [ main] com.netflix.discovery.DiscoveryClient : Shutting down DiscoveryClient ...
2024-07-17T12:18:54.864Z INFO 24476 --- [app] [ main] com.netflix.discovery.DiscoveryClient : Unregistering ...
2024-07-17T12:18:55.297Z INFO 24476 --- [app] [ main] com.netflix.discovery.DiscoveryClient : DiscoveryClient_app/<eureka-host>:app:8702 - deregister status: 404
2024-07-17T12:18:55.301Z INFO 24476 --- [app] [ main] com.netflix.discovery.DiscoveryClient : Completed shut down of DiscoveryClient
2024-07-17T12:18:55.322Z INFO 24476 --- [app] [ main] o.apache.catalina.core.StandardService : Stopping service [Tomcat]
2024-07-17T12:18:55.373Z INFO 24476 --- [app] [ main] .s.b.a.l.ConditionEvaluationReportLogger :
Error starting ApplicationContext. To display the condition evaluation report re-run your application with 'debug' enabled.
2024-07-17T12:18:55.441Z ERROR 24476 --- [app] [ main] o.s.boot.SpringApplication : Application run failed
org.springframework.context.ApplicationContextException: Failed to restore CRaC checkpoint on refresh
at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:539) ~[spring-context-6.1.10.jar!/:6.1.10]
at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:194) ~[spring-context-6.1.10.jar!/:6.1.10]
at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:981) ~[spring-context-6.1.10.jar!/:6.1.10]
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:627) ~[spring-context-6.1.10.jar!/:6.1.10]
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:146) ~[spring-boot-3.3.1.jar!/:3.3.1]
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:754) ~[spring-boot-3.3.1.jar!/:3.3.1]
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:456) ~[spring-boot-3.3.1.jar!/:3.3.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:335) ~[spring-boot-3.3.1.jar!/:3.3.1]
at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:149) ~[spring-boot-3.3.1.jar!/:3.3.1]
at de.app.platform.aggregate.CustomApplicationBuilder.run(CustomApplicationBuilder.java:36) ~[app-platform-aggregate-18.0.0-b002eaed.jar!/:18.0.0-b002eaed]
at de.app.MyApplication.main(MyApplication.java:10) ~[!/:18.0.0-b002eaed]
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na]
at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:91) ~[app-18.0.0-b002eaed.jar:18.0.0-b002eaed]
at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:53) ~[app-18.0.0-b002eaed.jar:18.0.0-b002eaed]
at org.springframework.boot.loader.launch.JarLauncher.main(JarLauncher.java:58) ~[app-18.0.0-b002eaed.jar:18.0.0-b002eaed]
Caused by: org.crac.RestoreException: null
at org.crac.Core$Compat.checkpointRestore(Core.java:150) ~[crac-1.4.0.jar!/:na]
at org.crac.Core.checkpointRestore(Core.java:237) ~[crac-1.4.0.jar!/:na]
at org.springframework.context.support.DefaultLifecycleProcessor$CracDelegate.checkpointRestore(DefaultLifecycleProcessor.java:530) ~[spring-context-6.1.10.jar!/:6.1.10]
... 15 common frames omitted
Suppressed: java.security.PrivilegedActionException: null
at java.base/java.security.AccessController.doPrivileged(AccessController.java:575) ~[na:na]
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:230) ~[na:na]
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:294) ~[na:na]
at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:273) ~[na:na]
at jdk.crac/jdk.crac.Core.checkpointRestore(Core.java:72) ~[jdk.crac:na]
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[na:na]
at org.crac.Core$Compat.checkpointRestore(Core.java:141) ~[crac-1.4.0.jar!/:na]
... 17 common frames omitted
Caused by: java.lang.ClassNotFoundException: /<path-to-jar>/app-18.0.0-b002eaed.jar
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:534)
at java.base/java.lang.Class.forName(Class.java:513)
at java.base/jdk.internal.crac.mirror.Core$2.run(Core.java:233)
at java.base/jdk.internal.crac.mirror.Core$2.run(Core.java:230)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:571)
... 24 common frames omitted
CRIU Logs: dump4.log restore.log
Comment From: sdeleuze
If you are using containers, be aware that configuring capabilities may be required, see https://github.com/sdeleuze/spring-boot-crac-demo/blob/main/restore.sh for an example. Also you may want to ensure the path app-18.0.0-b002eaed.jar
does not change (which could be the case with volumes, etc.)
Is app-18.0.0-b002eaed.jar
the executable JAR of your Spring Boot app?
Comment From: shmyer
I am not in a container environment. I am on a Linux VM on a VMWare Host. Could capabilities still be an issue here? I am currently on a 4.12 Linux kernel, which does not have the CHECKPOINT_RESTORE capability yet. It seems like on older Linux kernels the capability SYS_ADMIN is the one required for checkpoint/restore. I am using a non-root user.
However, as far as I understood CRIU is nevertheless running as root, since one thing I had to let our sysadmins do was this here: https://docs.azul.com/core/crac/crac-debugging#failures-in-native-checkpoint-or-restore
sudo chown root:root /path/to/criu
sudo chmod u+s /path/to/criu
Without that it didn't get past the CRIU part of the restore. But according to my restore.log the CRIU part of the restore seems to be working now.
Yes, the file's location is the same during the creation of the checkpoint and during the restore. Yes, this Jar file is the executable JAR of my Spring Boot app.
Comment From: sdeleuze
Looks like more a JDK/CRaC level issue so not sure what we can do about it on Framework side, do you agree?
Comment From: spring-projects-issues
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Comment From: shmyer
I guess you're right. In the end I've decided to abandon my plans to use CRaC. It doesn't seem mature enough to me.