Spring Asynchronous EntityManagerFactory bootstrapping to complete on context refresh completion [SPR-17334]

Andy Wilkinson opened SPR-17334 and commented

When a LocalEntityManagerFactory bean is configured with a bootstrap executor, there's a race between application context refresh and native entity manager factory bootstrapping. This makes things awkward for any logic that wants to run once bootstrapping has completed, for example something that relies on Hibernate DDL processing.

I've attached a small example that hopefully illustrates the problem. When imported into your IDE and run, EmfBootstrappingRaceApplication should thrown an exception when executing the SELECT query using JdbcTemplate. The failure will not occur if factory.setBootstrapExecutor(executor); is commented out. The failure will occur intermittently if the artificial delay introduced by the async executor is removed.

Affects: 4.3.19, 5.0.9, 5.1 GA

Reference URL: https://github.com/spring-projects/spring-boot/issues/14658

Attachments: - emf-bootstrapping-race.zip (49.57 kB)

Comment From: spring-projects-issues

Juergen Hoeller commented

This is meant to be by design: The application and its web endpoints can be up and listening already while the persistence provider is still bootstrapping, only blocking once a request comes in that actually needs to access the persistence provider.

That said, I can see the issue with post-bootstrapping logic here. I guess we could try to attach a callback to the async bootstrap thread there, or we could have a mode of bootstrapping where we effectively wait and block at the end of the refresh phase.

Comment From: spring-projects-issues

Andy Wilkinson commented

Ah, I see. And that will of course work fine if you only access the database via the entity manager. It remains problematic for applications that aren't purely using JPA and are using JdbcTemplate, jOOQ, or whatever as well.

The area of Boot that led me to investigate this was DataSource initialisation. Users can provide data scripts that populate the database once EntityManagerFactory bootstrapping has completed. We're currently detecting the completion of bootstrapping by decorating the JpaVendorAdapter and (ab)using the postProcessEntityManagerFactory callback. This works, but doesn't feel particularly clean. An official callback for the completion of bootstrapping would be useful.

As things stand, even with our (ab)use of the postProcessEntityManagerFactory callback, we still get into the state where refresh completes with the database in an unknown state. I think there'd definitely be benefit to an option that blocks right at the end of refresh until bootstrapping is completed. This should allow anything not going through JPA to access the database once its reached a predictable state.

Comment From: jhoeller

Revisiting this ticket in the backlog, I'm wondering what the current state of affairs is in Boot there.

Would a dedicated completion callback for EntityManagerFactory bootstrapping still be useful?

Should we consider introducing an option to wait for EntityManagerFactory completion at the end of context refresh, e.g. through joining the underlying Future on ContextRefreshedEvent? FWIW this can be externally accomplished already: through a custom ContextRefreshedEvent listener that casts the injected EntityManagerFactory handle to EntityManagerFactoryInfo and calls getNativeEntityManagerFactory() on it. This might be more natural to provide at Boot level than Framework level.

In any case, I'd like to close this ticket for good in the 6.1 timeframe.

Comment From: jhoeller

Following up on our background initialization option for the core container (#13410) with its strict rules, it seems sensible to enforce completion of asynchronous EMF initialization before context refresh completion (just like we do for @Bean(bootstrap=BACKGROUND)). While the original motivation of letting the EMF initialize up until the first request tries to use it, the lack of predictability is an issue indeed. Also, most applications have some kind of startup-time database operations anyway, effectively - but not reliably - forcing EMF initialization to have completed then. We should align this for better predictability.

Unless there is a compelling case made, I do not intend to let users opt out of this in order to enforce the predictability for JPA/Hibernate initialization specifically. So the behavior will simply be stricter in 6.2, always completing in time for context refresh completion. This enables us to rely on ContextRefreshedEvent for post-initialization tasks that can reliably assume the database to have been initialized (as a solution for #26153, effectively turning it into a documentation task).