Description I am experiencing a very strange problem, thats not easy to reproduce. However I was able to capture a Thread Dump via JProfile.

With the combination of : - Spring Boot 3-RC1 (also latest Snapshot from 7th November) - Jaeger Tracing inside Docker or Kubernetes

  • Prometheus
  • Zipkin
  • Data JPA

The Application will occasionally deadlock during bootstrap. Just after the initialization of "ServletWebServerApplicationContext" Removing one of the Dependencies will remedy the situation. The stacktrace below shows that the problem seems to occur when prometheus triggers the autoconfig of the tracer. However this only happens when combined with JPA. Full Stacktrace is attached as well as the gradle build file.

Unfortunately I was able to create a downsized application that is able to reproduce the error. So that is all the information i can provide for now

Question I also noticed, that the rebuild Micrometer Tracing behaves where differently from the old sleuth implementation. The traces include a lot of low level information (e.g. Spring Security Filter Chain). And I was also not able to get complete spans across multiple applications, just seperate ones. Will there be a fix and/or updated documentation with the Release of 3.0 GA ?

Thanks in advance

Thread main:
  at org.springframework.util.function.SingletonSupplier.get() (line: 94)
  at org.springframework.util.function.SingletonSupplier.obtain() (line: 115)
  at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.currentSpan() (line: 85)
  at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.isSampled() (line: 80)
  at io.prometheus.client.exemplars.DefaultExemplarSampler.doSample(double, io.prometheus.client.exemplars.Exemplar) (line: 46)
  at io.prometheus.client.exemplars.DefaultExemplarSampler.sample(double, io.prometheus.client.exemplars.Exemplar) (line: 34)

thread-trace.zip build.gradle.zip

Comment From: wilkinsona

Thanks for the report. For a deadlock to occur, two or more threads much be involved. Looking at the thread dump, there only appears to be a single thread involved. There's also no information about the locks that are held which unfortunately makes it impossible to see what's going on. Can you please try reproducing the problem and, when it occurs, capture a thread dump using kill -3 and share the complete output?

Comment From: wilkinsona

Question I also noticed, that the rebuild Micrometer Tracing behaves where differently from the old sleuth implementation.

Spring Boot's issue tracker isn't really the right place for this. The Micrometer Slack would be better.

Comment From: goafabric

@wilkinsona will try to reproduce and deliver the kill -3 result thank you

Comment From: goafabric

does not seem to be reproduceable on my faster macbook pro have to pickup my macbook air later and reverify there .... looks like a race condition

Comment From: goafabric

Attached the kill -3 output. Imo mostly tells the same story .. blocked inside SingletTonSupplier of SpanContextsSupplier .get() Which has a call to synchronize .. But I am not the expert here ...

Same code, runs with Spring Boot 2.x .. taking into account that a lot of transient libraries of Spring of Course have changed.

kill-dump.txt

Comment From: wilkinsona

Thank you, @goafabric. This dump is much more useful as it shows the deadlock:

Found one Java-level deadlock:
=============================
"main":
  waiting to lock monitor 0x0000600001bb35a0 (object 0x0000000702e01130, a org.springframework.util.function.SingletonSupplier),
  which is held by "Notification Thread"

"Notification Thread":
  waiting to lock monitor 0x0000600001ba63c0 (object 0x0000000700801208, a java.util.concurrent.ConcurrentHashMap),
  which is held by "main"

Java stack information for the threads listed above:
===================================================
"main":
    at org.springframework.util.function.SingletonSupplier.get(SingletonSupplier.java:94)
    - waiting to lock <0x0000000702e01130> (a org.springframework.util.function.SingletonSupplier)
    at org.springframework.util.function.SingletonSupplier.obtain(SingletonSupplier.java:115)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.currentSpan(ExemplarsAutoConfiguration.java:85)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.isSampled(ExemplarsAutoConfiguration.java:80)
    at io.prometheus.client.exemplars.DefaultExemplarSampler.doSample(DefaultExemplarSampler.java:46)
    at io.prometheus.client.exemplars.DefaultExemplarSampler.sample(DefaultExemplarSampler.java:34)
    at io.micrometer.prometheus.PrometheusCounter.updateExemplar(PrometheusCounter.java:79)
    at io.micrometer.prometheus.PrometheusCounter.increment(PrometheusCounter.java:58)
    at io.micrometer.core.instrument.Counter.increment(Counter.java:38)
    at io.micrometer.core.instrument.binder.logging.MetricsTurboFilter.decide(LogbackMetrics.java:199)
    at ch.qos.logback.classic.spi.TurboFilterList.getTurboFilterChainDecision(TurboFilterList.java:49)
    at ch.qos.logback.classic.LoggerContext.getTurboFilterChainDecision_0_3OrMore(LoggerContext.java:251)
    at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:375)
    at ch.qos.logback.classic.Logger.info(Logger.java:584)
    at org.flywaydb.core.internal.logging.slf4j.Slf4jLog.info(Slf4jLog.java:37)
    at org.flywaydb.core.internal.logging.multi.MultiLogger.info(MultiLogger.java:52)
    at org.flywaydb.core.internal.logging.EvolvingLog.info(EvolvingLog.java:58)
    at org.flywaydb.core.internal.license.VersionPrinter.printVersionOnly(VersionPrinter.java:56)
    at org.flywaydb.core.internal.license.VersionPrinter.printVersion(VersionPrinter.java:49)
    at org.flywaydb.core.FlywayExecutor.execute(FlywayExecutor.java:121)
    at org.flywaydb.core.Flyway.migrate(Flyway.java:128)
    at org.springframework.boot.autoconfigure.flyway.FlywayMigrationInitializer.afterPropertiesSet(FlywayMigrationInitializer.java:66)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1799)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1749)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:599)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:521)
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:326)
    at org.springframework.beans.factory.support.AbstractBeanFactory$$Lambda$280/0x0000000800db79c8.getObject(Unknown Source)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
    - locked <0x0000000700801208> (a java.util.concurrent.ConcurrentHashMap)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:324)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:200)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:313)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:200)
    at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1131)
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:906)
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:584)
    - locked <0x000000070089bae8> (a java.lang.Object)
    at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:146)
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:730)
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:432)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1302)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1291)
    at org.goafabric.personservice.Application.main(Application.java:19)
"Notification Thread":
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.getSingletonFactoryBeanForTypeCheck(AbstractAutowireCapableBeanFactory.java:972)
    - waiting to lock <0x0000000700801208> (a java.util.concurrent.ConcurrentHashMap)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.getTypeForFactoryBean(AbstractAutowireCapableBeanFactory.java:891)
    at org.springframework.beans.factory.support.AbstractBeanFactory.isTypeMatch(AbstractBeanFactory.java:619)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.doGetBeanNamesForType(DefaultListableBeanFactory.java:572)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.getBeanNamesForType(DefaultListableBeanFactory.java:531)
    at org.springframework.beans.factory.BeanFactoryUtils.beanNamesForTypeIncludingAncestors(BeanFactoryUtils.java:260)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:1539)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1338)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory$DependencyObjectProvider.getObject(DefaultListableBeanFactory.java:1972)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier$$Lambda$733/0x0000000801137ac0.get(Unknown Source)
    at org.springframework.util.function.SingletonSupplier.get(SingletonSupplier.java:97)
    - locked <0x0000000702e01130> (a org.springframework.util.function.SingletonSupplier)
    at org.springframework.util.function.SingletonSupplier.obtain(SingletonSupplier.java:115)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.currentSpan(ExemplarsAutoConfiguration.java:85)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.isSampled(ExemplarsAutoConfiguration.java:80)
    at io.prometheus.client.exemplars.DefaultExemplarSampler.doSample(DefaultExemplarSampler.java:46)
    at io.prometheus.client.exemplars.DefaultExemplarSampler.sample(DefaultExemplarSampler.java:34)
    at io.micrometer.prometheus.PrometheusCounter.updateExemplar(PrometheusCounter.java:79)
    at io.micrometer.prometheus.PrometheusCounter.increment(PrometheusCounter.java:58)
    at io.micrometer.core.instrument.binder.jvm.JvmGcMetrics$GcMetricsNotificationListener.countPoolSizeDelta(JvmGcMetrics.java:223)
    at io.micrometer.core.instrument.binder.jvm.JvmGcMetrics$GcMetricsNotificationListener.handleNotification(JvmGcMetrics.java:191)
    at sun.management.NotificationEmitterSupport.sendNotification(java.management@17.0.2/NotificationEmitterSupport.java:155)
    at com.sun.management.internal.GarbageCollectorExtImpl.createGCNotification(jdk.management@17.0.2/GarbageCollectorExtImpl.java:115)

Found 1 deadlock.

This deadlock is the same as reported in https://github.com/spring-projects/spring-framework/issues/23501. I'm not sure what, if anything, we can do about it in Boot.

Comment From: jonatan-ivanov

I think one possible solution to this would be start using the Tracer for the sampling decision later in the app lifecycle.

Just an example: what do you think about using the Tracer after the app started, i.e.: listening to ApplicationStartedEvent (or something else) and call the Tracer only after that event is received (but definitely before the app would have a chance to accept traffic)?

Since Exemplars are sampled and in our case using them means correlating logs/spans etc. I think saying that sampling will start only after the application reached a certain point in its lifecycle is ok. Technically, this what we do as of today too but in order to solve this issue, we could start sampling even a later point of the lifecycle of the app.

Comment From: philwebb

Deferring binding is something we've considered before in #30636

Comment From: goafabric

thank you all for looking for a solution

Comment From: wilkinsona

Here's a minimal reproducer that uses Thread.sleep() to widen the timing windows that have to overlap for the deadlock to occur:

package com.example.demo;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;

import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.tracing.Tracer;

@SpringBootApplication
public class Gh33070Application {

    private static final Log logger = LogFactory.getLog(Gh33070Application.class);

    public static void main(String[] args) {
        new Thread(() -> {
            while (true) {
                logger.warn("GC notification");
                try {
                    Thread.sleep(500);
                } catch (InterruptedException ex) {
                    Thread.currentThread().interrupt();
                }
            }
        }).start();
        SpringApplication.run(Gh33070Application.class, "--logging.level.root=warn");
    }

    @Bean
    InitializingBean initializingBean(MeterRegistry meterRegistry) {
        return () -> {
            try {
                Thread.sleep(2500);
            } catch (InterruptedException ex) {
                Thread.currentThread().interrupt();
            }
            logger.warn("Bean initialization");
        };
    }

    @Bean
    Tracer tracer(MeterRegistry meterRegistry) {
        try {
            Thread.sleep(5000);
        } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
        }
        return Tracer.NOOP;
    }

}
plugins {
    id 'org.springframework.boot' version '3.0.0-SNAPSHOT'
    id 'io.spring.dependency-management' version '1.1.0'
    id 'java'
}

group = 'com.example'
version = '0.0.1-SNAPSHOT'
sourceCompatibility = '17'

repositories {
    mavenCentral()
    maven { url 'https://repo.spring.io/milestone' }
    maven { url 'https://repo.spring.io/snapshot' }
}

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-actuator'
    implementation 'io.micrometer:micrometer-tracing'
    runtimeOnly 'io.micrometer:micrometer-registry-prometheus'
    testImplementation 'org.springframework.boot:spring-boot-starter-test'
}

tasks.named('test') {
    useJUnitPlatform()
}

A thread dump shows the same deadlock as we saw above:

Found one Java-level deadlock:
=============================
"main":
  waiting to lock monitor 0x00006000029dc680 (object 0x000000061ec53238, a org.springframework.util.function.SingletonSupplier),
  which is held by "Thread-0"

"Thread-0":
  waiting to lock monitor 0x00006000029e8dd0 (object 0x000000061e153a18, a java.util.concurrent.ConcurrentHashMap),
  which is held by "main"

Java stack information for the threads listed above:
===================================================
"main":
    at org.springframework.util.function.SingletonSupplier.get(SingletonSupplier.java:94)
    - waiting to lock <0x000000061ec53238> (a org.springframework.util.function.SingletonSupplier)
    at org.springframework.util.function.SingletonSupplier.obtain(SingletonSupplier.java:115)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.currentSpan(ExemplarsAutoConfiguration.java:91)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.isSampled(ExemplarsAutoConfiguration.java:82)
    at io.prometheus.client.exemplars.DefaultExemplarSampler.doSample(DefaultExemplarSampler.java:46)
    at io.prometheus.client.exemplars.DefaultExemplarSampler.sample(DefaultExemplarSampler.java:34)
    at io.micrometer.prometheus.PrometheusCounter.updateExemplar(PrometheusCounter.java:79)
    at io.micrometer.prometheus.PrometheusCounter.increment(PrometheusCounter.java:58)
    at io.micrometer.core.instrument.Counter.increment(Counter.java:38)
    at io.micrometer.core.instrument.binder.logging.MetricsTurboFilter.decide(LogbackMetrics.java:196)
    at ch.qos.logback.classic.spi.TurboFilterList.getTurboFilterChainDecision(TurboFilterList.java:49)
    at ch.qos.logback.classic.LoggerContext.getTurboFilterChainDecision_0_3OrMore(LoggerContext.java:251)
    at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:375)
    at ch.qos.logback.classic.Logger.log(Logger.java:780)
    at org.apache.commons.logging.LogAdapter$Slf4jLocationAwareLog.warn(LogAdapter.java:437)
    at com.example.demo.Gh33070Application.lambda$1(Gh33070Application.java:40)
    at com.example.demo.Gh33070Application$$Lambda$453/0x0000000800e50470.afterPropertiesSet(Unknown Source)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1799)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1749)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:599)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:521)
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:326)
    at org.springframework.beans.factory.support.AbstractBeanFactory$$Lambda$213/0x0000000800d43730.getObject(Unknown Source)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
    - locked <0x000000061e153a18> (a java.util.concurrent.ConcurrentHashMap)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:324)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:200)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:961)
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:916)
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:584)
    - locked <0x000000061e092780> (a java.lang.Object)
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:730)
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:432)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1302)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1291)
    at com.example.demo.Gh33070Application.main(Gh33070Application.java:29)
"Thread-0":
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:217)
    - waiting to lock <0x000000061e153a18> (a java.util.concurrent.ConcurrentHashMap)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:324)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:200)
    at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:254)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1405)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory$DependencyObjectProvider.getObject(DefaultListableBeanFactory.java:2002)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier$$Lambda$329/0x0000000800de24c0.get(Unknown Source)
    at org.springframework.util.function.SingletonSupplier.get(SingletonSupplier.java:97)
    - locked <0x000000061ec53238> (a org.springframework.util.function.SingletonSupplier)
    at org.springframework.util.function.SingletonSupplier.obtain(SingletonSupplier.java:115)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.currentSpan(ExemplarsAutoConfiguration.java:91)
    at org.springframework.boot.actuate.autoconfigure.tracing.exemplars.ExemplarsAutoConfiguration$LazyTracingSpanContextSupplier.isSampled(ExemplarsAutoConfiguration.java:82)
    at io.prometheus.client.exemplars.DefaultExemplarSampler.doSample(DefaultExemplarSampler.java:46)
    at io.prometheus.client.exemplars.DefaultExemplarSampler.sample(DefaultExemplarSampler.java:34)
    at io.micrometer.prometheus.PrometheusCounter.updateExemplar(PrometheusCounter.java:79)
    at io.micrometer.prometheus.PrometheusCounter.increment(PrometheusCounter.java:58)
    at io.micrometer.core.instrument.Counter.increment(Counter.java:38)
    at io.micrometer.core.instrument.binder.logging.MetricsTurboFilter.decide(LogbackMetrics.java:196)
    at ch.qos.logback.classic.spi.TurboFilterList.getTurboFilterChainDecision(TurboFilterList.java:49)
    at ch.qos.logback.classic.LoggerContext.getTurboFilterChainDecision_0_3OrMore(LoggerContext.java:251)
    at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:375)
    at ch.qos.logback.classic.Logger.log(Logger.java:780)
    at org.apache.commons.logging.LogAdapter$Slf4jLocationAwareLog.warn(LogAdapter.java:437)
    at com.example.demo.Gh33070Application.lambda$0(Gh33070Application.java:21)
    at com.example.demo.Gh33070Application$$Lambda$4/0x0000000800c16898.run(Unknown Source)
    at java.lang.Thread.run(java.base@17.0.4.1/Thread.java:833)

Found 1 deadlock.