Description
A simple demo project, using spring boot+webflux+spring data-r2dbc, stress testing, and some requesters hang for a long time until timeout. I'm not sure which link caused the suspension,but i did some experiments at https://github.com/ht-coder-wu/demo. I think :
- there must be some problems with connection pool or transactions
- request is suspended for some unknown reasons in the concurrent scenario for a long time
Version
- spring-boot-starter-parent 2.7.8
- r2dbc-mysql 0.8.2.RELEASE
- r2dbc-pool 0.9.2.RELEASE
- jdk 8
Recurrence(occasional)
application.properties like this:
spring.application.name=demo
spring.r2dbc.url=r2dbcs:mysql://localhost:3306/test
spring.r2dbc.username=root
spring.r2dbc.password=root
spring.r2dbc.pool.enabled=true
server.port=8080
logging.level.org.springframework.r2dbc=info
spring.r2dbc.pool.initial-size=100
spring.r2dbc.pool.max-size=500
I did four sets of tests:
Their difference is : testA Declarative transaction+request db testB Declarative transaction+do not request db testC no transaction+do not request db testD no transaction+request db
I use scripts for concurrency testing:
iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testA
there are 4 requests timeout(cause i set timeout 100s in webfilter and do response status 400),If I don't restart the service, continue to pressure test testD
request is suspended for some unknown reasons in the concurrent scenario for a long time
iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testD
It looks like there's no problem but there are some tx hang in db
If i restart the service after pressure test testA and test testD directly
iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testD
request is suspended for some unknown reasons in the concurrent scenario for a long time
I think there must be some problems with connection pool or transactions,It looks like testA causes some connections in the thread pool to no longer be submitted automatically,then testD which not in transaction hang in db tx.
there must be some problems with connection pool or transactions
Restart the service and pressure test testB directly
iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testB
request is suspended for some unknown reasons in the concurrent scenario for a long time
Restart the service and pressure test testC directly
iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testC
I have tried this scene several times ,It seems to be all right (no transaction,no db).
Summarize
-
testA(transaction&db)+testD ,we see tx hang in db till i restart service, there must be some problems with connection pool or transaction(Declarative & Programmatic).
-
testC(no transaction&no db), it seems to be all right.
- testB(transaction&no db),request hang till time out.
- testD(no transaction& db ),request hang till time out (Occasional appearance).
I'm not sure what the problem is, but it does exist,hope to get helps,thanks.
Comment From: QuantumXiecao
@ht-coder-wu Please check as follows: When your ab hung there, you open another terminal and just use curl http://localhost:8080/test*(ABCD is all OK) for test. I think the curling result should be ok. One possible problem is that your sysctl config for port range is not enough, which would cause ab to hang there(You could cat /proc/sys/net/ipv4/ip_local_port_range to check and ajust it to 1024~65536 and do your pressure test again).
Comment From: ht-coder-wu
@ht-coder-wu Please check as follows: When your ab hung there, you open another terminal and just use curl http://localhost:8080/test*(ABCD is all OK) for test. I think the curling result should be ok. One possible problem is that your sysctl config for port range is not enough, which would cause ab to hang there(You could cat /proc/sys/net/ipv4/ip_local_port_range to check and ajust it to 1024~65536 and do your pressure test again).
@QuantumXiecao
Thank you for your suggestion,I ignored the possibility of port range ,but when i adjusted it to 1024~65536 ,my ab hung there as before.
indeed, when my ab hung there,,i can curl for test,the result is ok ,i think the problem is occasional.
you can see the change:
and the ab results like this:
4 requests hung (100s) till time out as before ,then I didn't restart services and ab testD, problems existed as before.
Comment From: QuantumXiecao
@ht-coder-wu Please try ab -n 50000(modify from 100000 to 50000) -c 100 -s 120 http://localhost:8080/testC and show us the results. Many thx!
Comment From: ht-coder-wu
@ht-coder-wu Please try ab -n 50000(modify from 100000 to 50000) -c 100 -s 120 http://localhost:8080/testC and show us the results. Many thx!
@QuantumXiecao
request 50000's result:
request 100000's result:
I ignored the influence of port range factors at first,so I ab 100000 times testC again .
Compare these results,increase or decrease in quantity of request will affect the final result,connect will waste more time ,what's your opinion?
Then I pressure (50000 requests) testA and testD without restart service, things have no change.
vs
by the way,It's hard for us to control production enviroment visits,if the key of the problem are hardware resource,we want it to be more visible instead of hanging .....
Comment From: snicoll
@ht-coder-wu I am trying to get to the bottom of this. Do you confirm the problem goes away if that custom filter is not registered anymore?
Comment From: spring-projects-issues
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Comment From: ht-coder-wu
@ht-coder-wu I am trying to get to the bottom of this. Do you confirm the problem goes away if that custom filter is not registered anymore?
If you are referring to ContextFilter, I confirm that the problem still exists,because initially, this issue occurred in the production environment.
Comment From: snicoll
@ht-coder-wu thanks for following-up but that wasn't my question. I am asking if the problem does not occur if ContextFilter
is not registered.
Comment From: ht-coder-wu
@ht-coder-wu thanks for following-up but that wasn't my question. I am asking if the problem does not occur if
ContextFilter
is not registered.
sorry,It's my fault,I forgot to mention that ContextFilter is not registed in my production env at first. To clarify the issue more clearly,I comment out the register code like this
then. do testA and testD.
some request time out ,and some tx hung as before...
Comment From: ht-coder-wu
I'm not sure if this issue is due to the presence or absence of transactional DB requests in the code. As transactions are managed by Spring, the current session's auto commit is set to false before the request, while connections are taken out of the connection pool for reuse, causing transactional DB requests without transactions to be suspended
Comment From: ht-coder-wu
So it's just because the usage is incorrect,right?
Comment From: sdeleuze
Can you please:
- Provide a docker-compose.yml
with the version/configuration of MySQL to make the reproducer usable on my side
- Upgrade to Spring Boot 3.2.2 to ensure this is still reproducible
- Make sure that the code in the reproduced matches what you have in production (for ContextFilter
that does not seems to be the case since it is enabled in your repro and not in production per your comments).
- Check if you observe the same issue without transactions
Comment From: spring-projects-issues
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.