Current Behavior
This issue was first opened on r2dbc-postgresql, some discussion can be seen there. https://github.com/pgjdbc/r2dbc-postgresql/issues/459
Then it was migrated to r2dbc-pool https://github.com/r2dbc/r2dbc-pool/issues/140
The discussion lead to possible problems on spring implementation.
We are facing some issues related to stuck connections in the pool. After some investigation we could identify the problem happening in our search flow. Our frontend has a combobox with some kind of "debounce" which cancel resquests while user keep typing and sends only one request to our server. When these requests gets cancelled right after the validation query have runned establishing the connection as healthy, and before the real SQL query starts the connections never gets released. This is a very specific moment on the chain and its hard to reproduce.
We have found this post, that seems to be related but the suggested answer doesn't really helps, and the posted code sample does not leads to reproduce our problem. https://stackoverflow.com/questions/68407202/r2dbc-pool-connection-not-released-after-cancel
We could reproduce the behavior by calling any endpoint thats fetchs data from database and using a breakpoint after the connection gets established and before the query starts, then we force the request's cancel and release the breakpoint in our application, this way the connection always gets stuck.
The last query runned by the connection is always the validation query, in our case a simple "SELECT 1":
new ConnectionPool(
ConnectionPoolConfiguration.builder(connectionFactory)
.maxIdleTime(Duration.ofMillis(Long.parseLong(idleTimeout)))
.initialSize(Integer.parseInt(initialSize))
.maxSize(Integer.parseInt(maxSize))
.acquireRetry(3)
.maxLifeTime(Duration.ofMillis(Long.parseLong(maxLifetime)))
.validationQuery("SELECT 1")
.build())
The breakpoint was placed on the class DefaultFetchSpec from org.springframework.r2dbc.core, inside the public Flux all() method, but this is not a consistent way to reproduce it.
Another way we was able to reproduce it was by "stressing" the application and simulating several consecutives "network failures" forcing the cancel of the request, the log file produced by this approach can de found here https://github.com/jfrossetto/connection-stuck/blob/stress-cancel-pool088/log_connectionStuck2.txt
Analyzing the logs seems that when the connection gets "stuck" the FluxDiscardOnCancel does not show up in the logs All connections are stuck with field query filled with value SELECT 1
pid | datname | usename | application_name | backend_start | query_start | query |
---|---|---|---|---|---|---|
7463 | postgres | postgres | sample | 2021-10-27 11:51:31.626 | 2021-10-27 11:51:34.253 | SELECT 1 |
7464 | postgres | postgres | sample | 2021-10-27 11:51:31.626 | 2021-10-27 11:51:34.490 | SELECT 1 |
7465 | postgres | postgres | sample | 2021-10-27 11:51:34.505 | 2021-10-27 11:51:37.468 | SELECT 1 |
Comment From: snicoll
I can see that you've added a comment on the linked issue where you could reproduce the problem without any Spring dependency. It looks like this issue should have been closed. If I've misunderstood something, please provide an update with a supported version.