I'm currently trying to write minimal integration tests for gateway code (as an example for a Workshop). This includes booting up a small spring boot context with only the Feign client in it (at the moment) and all features like Hystric Circuit Breakers, Ribbon Load Balancing etc. activated. The "other" service is simulated using WireMock.
The tests work, but sometimes the first request throws an exception:
com.netflix.hystrix.exception.HystrixTimeoutException: null
at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$1$1.run(AbstractCommand.java:1154) [hystrix-core-1.5.12.jar:1.5.12]
at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:45) [hystrix-core-1.5.12.jar:1.5.12]
at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:41) [hystrix-core-1.5.12.jar:1.5.12]
at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable.run(HystrixContextRunnable.java:61) [hystrix-core-1.5.12.jar:1.5.12]
at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$1.tick(AbstractCommand.java:1159) [hystrix-core-1.5.12.jar:1.5.12]
at com.netflix.hystrix.util.HystrixTimer$1.run(HystrixTimer.java:99) [hystrix-core-1.5.12.jar:1.5.12]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_131]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
I could start the tests 10 times and at least one of the executions would fail. When it fails it is always the first executed request.
Here is a small project to reproduce the issue: nt-ca-aqe/ws-cnt-cloud-rest
Specifically these test classes:
/service-foo/src/test/kotlin/com/example/servicefoo/gateways/bar/BarClientIntegrationTest.kt/service-foo/src/test/kotlin/com/example/servicefoo/gateways/bar/AnotherBarClientIntegrationTest.kt
When one of the tests is executed, JUnit will actually execute it 10 times to demonstrate that when it fails, it's always the first requests.
As can be seen in AnotherBarClientIntegrationTest I already tried to increase timeouts anyway I know how to.
Comment From: ryanjbaxter
Can you try eagerly loading the ribbon clients? http://cloud.spring.io/spring-cloud-static/Finchley.RELEASE/single/spring-cloud.html#ribbon-child-context-eager-load
Comment From: slu-it
@ryanjbaxter yes, it works if I set both of these properties: ribbon.eager-load.enabled=true and ribbon.eager-load.clients=service-bar! THX
But is that the intended behaviour? Having the first request time out because the context is not loaded fast enough?
Just out of interest: Which of the timeout properties would I actually influence that particular timeout?
And another question: Why is it only a problem if Feign Hystrix support is enabled (feign.hystrix.enabled=true)? Is that a flawed observation because enabling Hystrix just extends the average context load time enough to timeout more often?
Comment From: slu-it
Since I updated the master branch with the suggested solution, I changed above link to reference the specific commit at which this was still an issue.
Comment From: ryanjbaxter
The ribbon services are not loaded until they are needed.
The reason why this happens when hystrix is happening is because hystrix sets a timeout. You should be able to adjust that timeout by setting hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds.
Comment From: slu-it
@ryanjbaxter ok, thank you for the help!
Comment From: sunjavax
Hi, We are facing the similar issue. Once application started, 1st request is falling into hystrix fallback method within/before the preConfigured timeout: (execution.isolation.thread.timeoutInMilliseconds : 12000). and throwing exception: com.netflix.hystrix.exception.HystrixTimeoutException: null.
But in our logs, we are also seeing success response from the actual requested service after this Fallback.
This is happening only for first request after application starting.
Following are our hystrix configurations hystrix.default.execution.timeoutEnabled: true hystrix.default.threadTimeoutInMilliseconds: 12000 hystrix.default.requestVolumeThreshold: 5 hystrix.default.errorThresholdPercentage: 50 hystrix.default.sleepWindowInMilliseconds: 5000 hystrix.default.rollingStatsTimeInMilliseconds: 10000 hystrix.default.threadpool.default.coreSize: 10
Can you please help us in this.
Comment From: ryanjbaxter
The solution is above. The request may still be made since the thread is not cancelled.
Comment From: sunjavax
@ryanjbaxter yes we already set this property: hystrix.command.HystrixCommandKey.execution.isolation.thread.timeoutInMilliseconds: 12000 and we are not using ribbon clients.
Comment From: ryanjbaxter
Well then you are not facing the same issue as whats described here. Please open a separate issue with a sample that reproduces the problem.
Comment From: yashwanthkalva
We are facing this issue, it is intermittent but more frequent.