First - DBA here and not directly programing with spring-boot.

Background

We ran into an issue with Vault user creation in Oracle because of connection pool timeout synchronization. In a single app this is likely not an issue but, while running in a spring boot shop that requires vault to be used for limited time use credentials. This means that vault is creating a new user for each connection in the connection pool. Each of these users have a limited lifespan and when that is expired it is removed (2 hours in our case). So at application start depending on the connection pool size you will have all connection requests will create a new user until the connection pools connections all have their own users/connections. This is typically not a problem as applications/micro services don't usually all start in sync.

============== Issue ============== We had an issue that required a database node to be restarted. This dropped a large portion of connections for a large portion of services. When those connections were re-established they all had similar or the same timeouts. When those users all timed out we saw an "Aftershock" incident from a flood of user creations. This was exacerbated by having large connection pools that wanted to recreate all the affected connections which would create new users. These new user creations caused high contention in the database when trying to grant the new user permissions.

For a real world example lets look at automobile traffic in a city. DB Connections are the cars on the road. Intersections are connection timeouts. If traveling and you come to a stop light all the "cars" stop and are held while the issue is resolved. Once the cars are allowed to start moving again they all cross that first "intersection" together. There is no way to mitigate this but it is expected behavior for everyone wanting to get going again. As those cars travel and come to the next "intersection" those "cars" are still clumped together. The more "intersections" you go through the more the "cars" spread out.

============== Ask ============== I would like to have a connection pool "traffic cop" that can artificially balance connection timeouts to diversify the expiration times. It could be even as simple as if the number of unused connections (u) in the connection pool (c) is greater than a number or precent of available connections (p) then the connection timeout (t) becomes the default timeout (o) divided by the modulo +1 of your desired timeout diversity (d).

f(u, c, d, o): p = c*d if (u < p): return o else: return o/((u%(1/d))+1)

================ Example ================ So if you have a connection pool of size 40 (c), a 120 minute timeout (o), and a connection diversity (d) of 25% your equation would look like.

c = 40 o = 120 d = 0.25

p = 40*0.25 = 10 if (u <= 10) -> t = 120/((u%4)+1)

f(u) = t

f(40) = 120 f(39) = 60 f(38) = 40 f(37) = 30 f(36) = 120 ... f(10) = 120 f(9) = 120 ...

In this example the only way that connection timeouts are lining up to be greater than 25% at once are if they are always using exactly 23, or 27 connections for a 150 minute window from start. This would be mitigated by making sure your connection pool limit is either a prime number or correctly sized so that it is used more than once in your time out window.

Comment From: wilkinsona

Thanks for taking the time to share you experience, the @Tylerlhess.

Spring Boot’s default connection pool is a third-party project called HikariCP. AFAIK, it doesn’t provide the required level of control over connection timeout for us to implement something like you have suggested. Even if it did, it feels to me like something that would be better implemented as part of the pool, rather than as an external add-on.

I’ll cc @brettwooldridge here who is the primary maintainer of Hikari. He may have some suggestions or be interested in the enhancement idea.

Comment From: Tylerlhess

I found the solution to my specific problem well half of it. While the connection pool all timing out at the same time did cause the issue. The fact that spring-vault was creating new creds after each failure was the true monster. Found the setting for that to not happen and think it should resolve most of the issues going forward.