Motivation
URLClassLoader shows awful performance when loading Classes/Resources which doesn't exist . URLClassPath
takes linear time to find a resource and it will traverse all over the classpath even if the resource doesn't exist. This will obviously slow down the startup , which is even worse when running from an executable fat jar or if the application's class path is a long list.
public Resource getResource(String name, boolean check) {
if (DEBUG) {
System.err.println("URLClassPath.getResource(\"" + name + "\")");
}
Loader loader;
for (int i = 0; (loader = getLoader(i)) != null; i++) {
Resource res = loader.getResource(name, check);
if (res != null) {
return res;
}
}
return null;
}
Scenarios which may lead to this issue are as follows.
1.Find a class repeatedly which apparently doesn't exist
- java.beans.Introspector.findCustomizerClass
private static Class<?> findCustomizerClass(Class<?> type) {
String name = type.getName() + "Customizer";
try {
type = ClassFinder.findClass(name, type.getClassLoader());
// Each customizer should inherit java.awt.Component and implement java.beans.Customizer
// according to the section 9.3 of JavaBeans specification
if (Component.class.isAssignableFrom(type) && Customizer.class.isAssignableFrom(type)) {
return type;
}
}
catch (Exception exception) {
// ignore any exceptions
}
return null;
}
related issue: https://github.com/spring-projects/spring-framework/issues/13653
Here are the top 10 calls of loadClass
in the startup of my application
Calls ClassName
480 java.lang.ObjectCustomizer
249 org.springframework.beans.factory.AwareCustomizer
99 org.springframework.beans.factory.InitializingBeanCustomizer
93 org.springframework.core.OrderedCustomizer
87 org.springframework.context.ApplicationContextAwareCustomizer
61 org.springframework.beans.factory.BeanFactoryAwareCustomizer
49 org.springframework.context.annotation.ConfigurationClassEnhancer$EnhancedConfigurationCustomizer
41 org.springframework.context.ApplicationListenerCustomizer
31 org.springframework.web.context.ServletContextAwareCustomizer
29 org.springframework.beans.factory.BeanNameAwareCustomizer
- org.apache.logging.log4j.core.lookup.Interpolator
if (Constants.IS_WEB_APP) {
try {
strLookupMap.put(LOOKUP_KEY_WEB,
Loader.newCheckedInstanceOf("org.apache.logging.log4j.web.WebLookup", StrLookup.class));
} catch (final Exception ignored) {
handleError(LOOKUP_KEY_WEB, ignored);
}
} else {
LOGGER.debug("Not in a ServletContext environment, thus not loading WebLookup plugin.");
}
try {
strLookupMap.put(LOOKUP_KEY_DOCKER,
Loader.newCheckedInstanceOf("org.apache.logging.log4j.docker.DockerLookup", StrLookup.class));
} catch (final Exception ignored) {
handleError(LOOKUP_KEY_DOCKER, ignored);
}
try {
strLookupMap.put(LOOKUP_KEY_SPRING,
Loader.newCheckedInstanceOf("org.apache.logging.log4j.spring.boot.SpringLookup", StrLookup.class));
} catch (final Exception ignored) {
handleError(LOOKUP_KEY_SPRING, ignored);
}
try {
strLookupMap.put(LOOKUP_KEY_KUBERNETES,
Loader.newCheckedInstanceOf("org.apache.logging.log4j.kubernetes.KubernetesLookup", StrLookup.class));
} catch (final Exception | NoClassDefFoundError error) {
handleError(LOOKUP_KEY_KUBERNETES, error);
}
Here are the top 4 calls of loadClass
in the startup of my application
Calls ClassName
225 org.apache.logging.log4j.docker.DockerLookup
225 org.apache.logging.log4j.kubernetes.KubernetesLookup
225 org.apache.logging.log4j.spring.cloud.config.client.SpringLookup
225 org.apache.logging.log4j.web.WebLookup
- Many other techniques or frameworks which use loadClass to check class's existence For example,org.springframework.http.converter.json.Jackson2ObjectMapperBuilder
private void registerWellKnownModulesIfAvailable(MultiValueMap<Object, Module> modulesToRegister) {
Class kotlinModuleClass;
Module kotlinModule;
try {
kotlinModuleClass = ClassUtils.forName("com.fasterxml.jackson.datatype.jdk8.Jdk8Module", this.moduleClassLoader);
kotlinModule = (Module)BeanUtils.instantiateClass(kotlinModuleClass);
modulesToRegister.set(kotlinModule.getTypeId(), kotlinModule);
} catch (ClassNotFoundException var6) {
}
2.Find resource repeatedly which doesn't exist
Here are the top 4 calls of findResource/findResources
in the startup of my application
Calls Resource
106 META-INF/services/org.apache.logging.log4j.core.util.WatchEventService
92 log4j2.system.properties
64 META-INF/services/javax.xml.parsers.DocumentBuilderFactory
29 META-INF/spring.components
28 META-INF/services/javax.xml.transform.TransformerFactory
18 java/lang/com.class
18 Throwable.class
18 Object.class
18 Class.class
17 String.class
8 META-INF/services/javax.xml.datatype.DatatypeFactory
5 config/application.yml
5 config/application.yaml
5 config/application.xml
5 application.yml
5 application.yaml
5 application.xml
3.Find resource repeatedly
For example
Calls Resource
13 org/springframework/boot/autoconfigure/dao/PersistenceExceptionTranslationAutoConfiguration.class
13 org/springframework/boot/actuate/autoconfigure/health/HealthEndpointConfiguration.class
12 org/springframework/orm/ibatis/support/SqlMapClientDaoSupport.class
12 org/springframework/dao/support/DaoSupport.class
12 org/springframework/core/io/ByteArrayResource.class
12 org/springframework/core/io/AbstractResource.class
12 com/github/javaparser/ast/visitor/VoidVisitorAdapter.class
8 config/application.properties
7 application.properties
5 config/application-local.properties
3 org/apache/http/client/version.properties
4.Reflection repeatedly triggers sun.reflect.GeneratedMethodAccessor<N>
class loading in runtime
All the cases listed above happen at startup time, while there are cases in runtime. This will make runtime performance worst. Here is a case which triggers sun.reflect.GeneratedMethodAccessor<N>
class loading : https://blog.actorsfit.com/a?ID=00001-e5b50c75-c8a5-4cdb-9b0c-5558fb985a60
at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
- waiting to lock <0x0000000088104b60> (a java.lang.Object)
at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:94)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.logging.log4j.core.impl.ThrowableProxy.loadClass(ThrowableProxy.java:539)
at org.apache.logging.log4j.core.impl.ThrowableProxy.toExtendedStackTrace(ThrowableProxy.java:660)
at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:137)
at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:94)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.logging.log4j.core.impl.ThrowableProxy.loadClass(ThrowableProxy.java:539)
at org.apache.logging.log4j.core.impl.ThrowableProxy.toExtendedStackTrace(ThrowableProxy.java:660)
at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:137)
at org.apache.logging.log4j.core.impl.ThrowableProxy.<init>(ThrowableProxy.java:121)
at org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:555)
at org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:61)
We generally solve these issues mentioned above case by case and solutions are normally tricky. Is there any final solution to solve this fundamentally?
Approach
This commit trying to fundamentally solve this issue by adding a ClassLoaderCache to LaunchedURLClassLoader .
- Cache
ClassNotFoundException
and fast throw it for the next time - Cache
getResource
result directly - Cache
getResources
NOT-EXIST result - Set system property
loader.cache.enable=true
to enable cache
Benefit
Here are the startup performance of our applications after this enhancement, which shows a 50% acceleration on average App|No Cache | Cache | Improvement --|-- | -- | -- app1|75s | 39s | 48% app2|83s | 32s | 61% app3|152s | 87s | 43% app4|318s | 137s | 57%
Enviroment:2C4G , OpenJDK 1.8.0_172, springboot 2.7.0
Comment From: wilkinsona
This looks very promising. Thank you, @zjulbj. Have you done any comparisons of heap usage with and without the cache?
Comment From: zjulbj
This looks very promising. Thank you, @zjulbj. Have you done any comparisons of heap usage with and without the cache?
Hi wilkinsona. Thanks for your reply. Talking about heap usage , there are two ways to control it
- All caches are instances of fixed size LRU cache, and cache size can be configured by loader.cache.size
(default 3000)
protected <K, V> Map<K, V> createCache(int maxSize) {
return Collections.synchronizedMap(new LinkedHashMap<K, V>(maxSize, 0.75f, true) {
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() >= maxSize;
}
});
}
- All caches are cleared after the startup, which is invoked by
org.springframework.boot.ClearCachesApplicationListener
/**
* Clear URL caches.
*/
public void clearCache() {
this.loaderCache.clearCache();
if (this.exploded) {
return;
}
for (URL url : getURLs()) {
try {
URLConnection connection = url.openConnection();
if (connection instanceof JarURLConnection) {
clearCache(connection);
}
}
catch (IOException ex) {
// Ignore
}
}
}
Here is a comparison of memory usage, and I think it's acceptable without cache
HEAP-MEMORY-USAGE init : 268435456(256.0 MiB)
[memory in bytes] used : 1181133080(1.1 GiB)
committed : 1799880704(1.7 GiB)
max : 3817865216(3.6 GiB)
NO-HEAP-MEMORY-USAGE init : 2555904(2.4 MiB)
[memory in bytes] used : 171428168(163.5 MiB)
committed : 176160768(168.0 MiB)
max : -1(-1 B)
PENDING-FINALIZE-COUNT 0
with cache
HEAP-MEMORY-USAGE init : 268435456(256.0 MiB)
[memory in bytes] used : 1036303928(988.3 MiB)
committed : 1913126912(1.8 GiB)
max : 3817865216(3.6 GiB)
NO-HEAP-MEMORY-USAGE init : 2555904(2.4 MiB)
[memory in bytes] used : 171065096(163.1 MiB)
committed : 176160768(168.0 MiB)
max : -1(-1 B)
PENDING-FINALIZE-COUNT 0
Comment From: mhalbritter
Thanks for the heap comparisons! I'm going to bring this up on our next team call.
Comment From: mhalbritter
Hey, thanks for your contribution and your work. We want to run more benchmarks on it and want to explore other paths (like indexing the classpath etc.) and don't want to commit yet on the cache solution. I'll put it on hold until we have more information / ideas.
Comment From: zjulbj
Hey, thanks for your contribution and your work. We want to run more benchmarks on it and want to explore other paths (like indexing the classpath etc.) and don't want to commit yet on the cache solution. I'll put it on hold until we have more information / ideas.
Many thanks for your reply. If I understand correctly,you want to explore other paths to solve this issue in a more widely view? That is also what I am going to do next. My rough idea is generating a index file mapping class/resource/packages to jar urls in the compile time, a little bit like INDEX.LIST,which I believe can imporve the time complexity to O(1). I not sure whether this is what 'indexing the classpath' meant with,and whether am I on a proper path
Comment From: mhalbritter
Yes, that was meant with "indexing the classpath" :)
Comment From: zjulbj
Yes, that was meant with "indexing the classpath" :)
OK,thanks a lot. It seems similar issues has been discussed long time ago :) http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/035009.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/035010.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/035011.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/035012.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/035013.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-September/035192.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-September/035210.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-September/035290.html
Comment From: HzjNeverStop
@mhalbritter Is there any progress in "indexing the classpath"?will it be realize in 2.7.x or only support in 3.x?
Comment From: wilkinsona
This will be 3.0.x at the earliest, but more likely later in the 3.x cycle. I have updated the milestone accordingly.
Comment From: wuyupengwoaini
@wilkinsona @mhalbritter Is there any progress in "indexing the classpath"?I really want to know how this is going, because it seems to really help us shorten the startup time of the application. Because some of our big applications have thousands of dependencies
Comment From: wilkinsona
Not at the moment, unfortunately. We have other, higher-priority work at the moment. Perhaps you could build the changes in this PR and share some before and after numbers for your apps?
Comment From: wilkinsona
Those interested in this PR, it would be really useful to know if you've tried running your app using class data sharing (CDS), particularly with Java 21. We've seen some very promising results with Spring Boot apps and would like to know if you have a similar experience. You can learn more in this blog post from @sdeleuze that includes some information about a script that unpacks a Spring Boot executable jar into a format that's as CDS-friendly as possible.
Comment From: mmoayyed
I wrote about my experience with CAS here.
I did not try the script that does the unpacking, but in a nutshell, what I saw (with a WAR file) was:
jar -xf build/libs/cas.war
cd build/libs
java org.springframework.boot.loader.launch.JarLauncher
...
...
...
<Started CasWebApplication in 7.693 seconds (process running for 9.285)>
Then:
jar -xf build/libs/cas.war
cd build/libs
java -cp "WEB-INF/classes:WEB-INF/lib/*" org.apereo.cas.web.CasWebApplication
...
...
...
<Started CasWebApplication in 6.873 seconds (process running for 7.469)>
and then with CDS:
java -XX:+AutoCreateSharedArchive -XX:SharedArchiveFile=cas.jsa \
org.springframework.boot.loader.launch.JarLauncher
...
...
...
<Started CasWebApplication in 6.766 seconds (process running for 7.582)>
Not that much difference, but I can see what I might have missed with the script.
Comment From: sdeleuze
Please check the guidelines printed when using the unpack-executable-jar.sh
script.
It is expected they you get no significant gain when using the JarLauncher
.
To start the application from that directory:
$ java -jar run-app.jar
TIP: To improve startup performance, you can create a CDS Archive with a single training run:
$ java -XX:ArchiveClassesAtExit=application.jsa -Dspring.context.exit=onRefresh -jar run-app.jar
Then use the generated cache:
$ java -XX:SharedArchiveFile=application.jsa -jar run-app.jar
See https://docs.spring.io/spring-framework/reference/integration/class-data-sharing.html for more details.
Comment From: mmoayyed
Thanks! I made small modifications to the script so it can run against a WAR file (replaced BOOT-INF
with WEB-INF
). With that in mind, I am not sure if this is still decent feedback and a good test, but here we go:
unpack-executable-jar.sh -d tmp build/libs/cas.war
Then:
$ java -jar tmp/run-app.jar
.
.
- <Started CasWebApplication in 6.637 seconds (process running for 8.178)>
Then:
java -XX:ArchiveClassesAtExit=application.jsa -Dspring.context.exit=onRefresh -jar tmp/run-app.jar
java -XX:SharedArchiveFile=application.jsa -jar tmp/run-app.jar
.
.
- <Started CasWebApplication in 6.284 seconds (process running for 6.693)>
Comment From: sdeleuze
I had a deeper look, this CAS application seems to perform a lot of processing at startup. To put things in perspective, on my powerful MacBook Pro M2, the CAS application takes 5.285 s to start (after disabling key generation at startup) while Petclinic JDBC takes 1.448 s to start. CDS is improving the loading of the classes, it can't optimize custom application processing at startup.
If we analyze the data points more in details (unpacked variant is done with the unpack-executable-jar.sh script which replicate #38276 behavior):
JARexe: Started CasWebApplication in 7.854 seconds (process running for 8.709) Unpacked: Started CasWebApplication in 4.778 seconds (process running for 5.285) Unpacked with CDS:Started CasWebApplication in 4.36 seconds (process running for 4.712)
JARexe: Started PetClinicApplication in 1.586 seconds (process running for 1.842) Unpacked: Started PetClinicApplication in 1.299 seconds (process running for 1.448) Unpacked with CDS: Started PetClinicApplication in 0.859 seconds (process running for 0.978) Unpacked with CDS + AOT: Started PetClinicApplication in 0.678 seconds (process running for 0.81) Project Leyden + AOT: even more
So CDS seems to optimize pretty consistently the loading of the classes on the 2 samples (0,573 s reduction versus 0,47 s). This half of a second on a powerful machine turns to be multiple seconds when deployed on a cheap Cloud server.
I think the layout used by the script and proposed by #38276 is immune even without CDS to the performance issue discussed in this PR, since it does not use LaunchedURLClassloader
. So using both system classloader + CDS provide a significant performance boost.
AOT could be used to optimize even more the Spring Boot configuration processing of this application, but I was not able to make it work with CAS due to some double bean registration (feel free to raise a bug on https://github.com/spring-projects/spring-framework if you think there is something to fix on Spring Framework side).
Cache Data Store (Project Leyden premain) will be able to optimize with a wider scope than just class loading (optionally combined with AOT). It will for example include JVM AOT cache and maybe some classpath check caches. For the rest, that will be to the application itself to make careful use of the processing steps happening before startup.
Comment From: wilkinsona
Thanks very much, @sdeleuze.
Comment From: mmoayyed
Thank you very much @sdeleuze.
(At the moment AOT processing of the CAS application is blocked due to a bug in Spring Data. I will revisit this once the fix is out, but even at the time I was able to build native images, startup time of the native image was still around 2~3 seconds)
I hope this isn't hijacking the thread here, but does the team have any recommendations on how to analyze the startup data and performance beyond the likes of VisualVM, etc? One suspicion I have (at least in the case of this CAS app) is that given it's a Spring Cloud app and uses @RefreshScope
, I wonder if all those proxied beans are slowing down the startup. I don't suppose PetClinicApplication uses Spring Cloud, or put differently, have you also done CDS analysis of Spring Cloud apps? Is that not in scope of this exercise?
Comment From: sdeleuze
@mmoayyed You can probably use FlightRecorderApplicationStartup. @RefreshScope
can indeed have an impact but the 3 seconds in native tends to show this is more application specific. Now let's maybe indeed focus again on the original purpose of that PR ;-)
Comment From: mmoayyed
Sure thing, thank you very much!
Comment From: qixiaobo
this issue may fix https://github.com/spring-projects/spring-framework/issues/13410
Comment From: qsLI
Thanks for your pull request. With this cache, our app's start time has decreased by 50%.
Comment From: qsLI
The AppCDS technology relies on generating an archive file every time the JVM exits. How is this issue addressed during the initial startup? Moreover, applications are iterative, with different code each time, necessitating the regeneration of the archive file.
AppCDS for custom classloader has a more complex loading process: the JVM uniquely identifies a class using a (classloader, className) tuple.
-
For the BootClassloader and AppClassloader, they are unique in every run, thus maintaining a consistent identity across multiple runs.
-
For custom classloaders, apart from their type, there is no obvious unique identifier. Consequently, AppCDS cannot locate the required InstanceKlass entry in the shared archive during the class loading phase using just the classloader object and type.
To conclude, CDS is not the key to solve the startup time problem
Comment From: sdeleuze
@qsLI CDS indeed does not apply to custom classloaders, but I am wondering if you take the perspective of running your Spring Boot application directly from the executable JAR on production.
The AppCDS technology relies on generating an archive file every time the JVM exits. How is this issue addressed during the initial startup?
It is kind of addressed via a more integrated feature that automatically perform the training run without starting fully your application, see https://docs.spring.io/spring-boot/how-to/class-data-sharing.html for more details. There are still side effects like early database interaction but that could evolve and we have documented how to avoid that easily.
With this cache, our app's start time has decreased by 50%.
Could you please confirm this is compared to the startup time of an executable JAR. Can you measure against an application extracted with what is described in https://docs.spring.io/spring-boot/reference/packaging/efficient.html?
Comment From: qsLI
@sdeleuze First of all, thank you for your detailed response.
Due to historical reasons, we have been using the Spring Boot executable JAR mode for deployment, so most class loaders are LaunchedURLClassLoader, which makes it impossible to use the CDS feature. Switching to the extracted JAR mode should resolve this issue.
Additionally, our project is filled with various lazy implementations, which may require actual traffic to trigger the corresponding class loading. Generating the CDS file during the compile phase might not fully resolve the issues we are encountering.
The 50% performance improvement data is indeed based on the executable JAR. We will try the extracted mode when we have the time.