Hi all
We have discovered two issues with the selection of the next server for a retry in the RibbonLoadBalancedRetryPolicy (in spring-cloud Dalston.SR2)
1) RibbonLoadBalancedRetryPolicy may use the same server again if it needs to try another server:
If the registerThrowable() method decides that it needs to use another server for the retry (ribbon.MaxAutoRetries is reached), it just asks the LoadBalancer to choose the next server. But there is no guarantee that the LoadBalancer does not return the same server again because the LoadBalancer uses a global Round-Robbin algorithm to choose the next server and does not know about the servers already used by the RetryPolicy.
2) ZonePreferenceServerListFilter prevents retry in another zone
We would like to configure the behavior of the RibbonLoadBalancedRetryPolicy so that it will use the servers from another eureka zone if all available servers in the own zone were already used for the retry attempts. This seams to be impossible to handle at the level of the RibbonLoadBalancedRetryPolicy because the ZonePreferenceServerListFilter removes the servers from the remote zones during the process.
Is there a way to configure or extend the RibbonLoadBalancedRetryPolicy so that it won't use the same server again for retrying a request on the next server (ribbon.MaxAutoRetries is reached) as long as not all available servers in all eureka zones where tried?
Best regards, Matthias
Comment From: ryanjbaxter
Have you tried customizing the load balancing rule and/or implementing your own ServerListFilter? Why would those options not work for you?
Comment From: germm
Many thanks for your answer. We could change the behaviour with a custom LoadBalancedRetryPolicy and switching to the ZoneAffinityServerListFilter but it was quite a lot of code to write. Are there any plans to change the default behavior or to give more configuration options for the two issues we discovered?
Comment From: ryanjbaxter
I dont really consider them issues/bugs. The fact that the round robbin load balancer rule could potentially use the same server again is perfectly fine given the way you have Ribbon configured. If you want it to stop after it has used all the servers for a given service you can set client.ribbon.MaxAutoRetriesNextServer so that it never tries more times than there are servers. Of course you could also implement your own load balancer rule and keep track of the servers tried and stop once you see the same server again.
ZonePreferenceServerListFilter is behaving as designed. If you want a different behavior you can implement your own ServerListFilter.
You shouldn't have to do so anything with loadBalancedRetryPolicy to accomplish what you are trying to do.
Comment From: germm
Hello Ryan
Many thanks for the explanation. My problem is that i need to track the servers already tried per request. IMHO, the LoadBalancedRetryPolicy is where i can do that. As far as i have seen, the LoadBalanderRule is global and does not have a instance per request.
I try to explain the issue with the default RibbonLoadBalancedRetryPolicy with an example:
We have two Servers (Server A and Server B) and my ribbon configuration is:
MaxAutoRetries=0
MaxAutoRetriesNextServer=1
With this configuration, the retry should always use another server than the first try. But we observed the following behaviour if two requests arrive at the same time:
1) Request 1 arrives: Loadbalancer chooses Server A 2) Request 2 arrives: Loadbalancer chooses Server B 3) Request 1 fails: Loadbalancer chooses Server A for the retry! => Request 1 is retryed on Server A!
I hope that this explains the issue better.
Best Regards, Matthias
Comment From: ryanjbaxter
Thanks for the example. I understand what you are trying to do, I am just pointing out that the existing default behavior isn't necessarily a bug. You are welcome to implement your own LoadBalancedRetryPolicy to satisfy your use case. You should be able to provide your own LoadBalancedRetryPolicyFactory bean to create the implementation of LoadBalancedRetryPolicy for your use case.
Comment From: spencergibb
Closing this due to inactivity. Please re-open if there's more to discuss.
Comment From: cc132
Hi,germn,I get the same problem
I have two servcies called ArticleApplication(service provider) and UserApplication(service consumer), when I configure ribbon retry police, I find the same ArticleApplicaion service is called again, this phenomenon is inconsistent with the explanation of the ribbon official documentation