Right now DataSourceInitializer runs regardless of the fact that we've auto-configured a DataSource or a custom one is available. If more than one is available, this currently throws a low-level exception because we expect a primary.

We have plans to change the last bit in 2.0 but I think it is confusing that we apply something datasource related on something that we haven't auto-configured. After all things defined in spring.datasource (in DataSourceProperties) should only be taken into account if we configure the DataSource.

The side effect is that if we do this explicitly, we have the guarantee the DataSource is initialized and ready to be used once we expose it in the context. Right now we use a BPP for that and it causes hard to track issues such as #9394

The purpose of this issue is to:

Create a public pojo-based API that initializes a DataSource. It should be super easy to grab an instance of such object based on DataSourceProperties
Replace the auto-configuration to create that object and migrate the database. We have DataSourceConfiguration#createDataSource that has all that we need to share that logic. We'd also need to update EmbeddedDataSourceConfiguration so it would be nice if we could easily migrate a DataSource based on a DataSource and a DataSourceProperties instances

If someone is willing to initialize a custom DataSource then they'll have to do the same thing but at least it will be explicit.

Comment From: snicoll

I did a bit of hacking on this which makes me think that the original statement has to be revisited a bit.

Public API

The current code is highly dependant of DataSourceProperties and breaking that link basically means that we would have to duplicate the concept. If a user name and a password is provided for the upgrade, we recreate a datasource based on the settings defined in DataSourceProperties. So to be a public API, we should accept a hard link between the DataSource and the DataSourceProperties instances.

This should be mitigated with the fact that our doc invites users to create their custom DataSource using a custom DataSourceProperties object. There isn't a clean way to validate that link though.

Also, the invocation of the migration is purely programmatic so any advanced feature where the intialization should happen only once Hibernate has started will have to be managed by the user somehow.

Event

The DataSourceInitializedEvent should probably go away as it is JPA specific and is not very useful outside the internal implementation.

Ordering

Right now, the initializer does two separate things:

Create the schema: has to happen before Hibernate starts (in case you've set DDL validation for instance)
Initialize the schema: has to happen after in case you're using Hibernate to create the schema.

While our implementation is considered a simple solution (compared to our Liquibase/Flyway support), it's actually much harder to implement. The first link can be easily done by adding a ref to the entity manager (The Liquibase and Flyway auto-configs already do that).The second link is harder though since the same bean already required the opposite order to be able to create the schema.

Technicall we should wait for Flyway, Liquibase or Hibernate DDL auto to complete to initialize the schema. And that's where the event we have may have more sense.

I am wondering if SmartInitializingSingleton couldn't be a cheap solution for this.

Comment From: snicoll

1115 seems to be in contradiction with my initial ordering analysis. So it looks like everything should happen as soon as the `DataSource` is available. I am trying now to understand why `DataSourceInitializedPublisher` does what it does.

Comment From: snicoll

More investigation on the code (this one is definitely tricky!)

The following cases are supported:

JPA (ddl auto off) + init enabled: everything runs as soon as the DataSource is available because there is at least one schema script
JPA (ddl auto on) + init enabled: the script is empty (since we want hibernate to do that for us). The @PostConstruct of the initializer doesn't do anything and rely on "something else" to trigger the event. (we have a BPP that sends an event as soon as it sees an initialized EntityManagerFactory)
No JPA + init enabled: everything runs as soon as the DataSource is available

This allowed me to discover a weird case: if you don't have JPA and only an init script (no schema) because something custom is going to create the schema, the initializer doesn't run.

It would probably be easier if we looked at things a bit differently:

Something must create the schema, it's either the initializer or hibernate
The schema must be initialized if necessary once the schema has been created

Perhaps a SchemaCreatedEvent would clarify the intend but I also think that calling things explicitly makes it easier to follow. Sending an event if nobody is listening for something that must happen is a bit weird so perhaps we should retrieve the invoker in the BPP and invoke it instead? This may create a package tangle though.

Comment From: snicoll

We now have better test coverage for those cases that I've discovered which should facilitate the refactoring.

Having discussed with @jhoeller we won't be able to get rid of the BPP dance. But we can get rid of the event and make the intent more clear. Also, for users that do not require things to be created super early rolling their own version should be pretty easy.

Comment From: snicoll

The BPP dance is such that we're considering running the initializer if one DataSource is available. It doesn't matter if Spring Boot auto-configured it or if a user provided one but as long as there is one candidate we should initialize it.

To make that intend more clear, we need to move the initialization-related properties out of DataSourceProperties with a DataSourceInitializationProperties. Perhaps with a dedicated auto-configuration that runs after DataSourceAutoConfiguration.

Comment From: snicoll

Unfortunately it is not that easy, we do have support for custom username/password for both the creation of the schema and its initialization. When that feature is used, it is assume that we're going to create the exact same DataSource as the one that the auto-configuration would have created.

I confirm we need to configure a single DataSource if present. I've tried to make it only applicable in case of auto-configuration and @JdbcTest blows up as it's creating the DataSource itself before DataSourceAutoConfiguration gets a chance to kick in (in short: such datasource is not initialized).

Comment From: snicoll

Let's make sure to validate #9394 works now. There is a sample attached to the issue that still breaks and the only way to make it work is to stop with the BPP dance (which isn't going to happen so we may want to go the documentation route for that case).

Comment From: zachmarshall

I updated the test repo for #9394 with minor changes for spring-boot 2: https://github.com/zachmarshall/spring-datasource-init-bug/tree/spring-boot-2. As you said, it still fails. Please let me know if there's something you'd like for me to validate as working.

Comment From: rvit34

Please, see my comment https://github.com/spring-projects/spring-boot/issues/9394#issuecomment-398812520 Am I correct that now there is no way to fix it?

Comment From: wilkinsona

This has been superseded by #25756 and #25323.

SpringBoot Clarify the scope of DataSourceInitializer

Public API

Event

Ordering

1115 seems to be in contradiction with my initial ordering analysis. So it looks like everything should happen as soon as the DataSource is available. I am trying now to understand why DataSourceInitializedPublisher does what it does.

1115 seems to be in contradiction with my initial ordering analysis. So it looks like everything should happen as soon as the `DataSource` is available. I am trying now to understand why `DataSourceInitializedPublisher` does what it does.