Describe the bug when use spring-retry & ribbon, provider only have two instances, consumer will always call one instance when response code is 500
Sample
provider:
ribbon:
NFLoadBalancerRuleClassName: com.netflix.loadbalancer.RoundRobinRule
MaxAutoRetries: 1
MaxAutoRetriesNextServer: 0
OkToRetryOnAllOperations: true
retryableStatusCodes: 500
this will only happen if there are two provider instances, and request to provider always return ststuscode 500
find code in org.springframework.cloud.netflix.ribbon.RibbonLoadBalancedRetryPolicy
public boolean canRetrySameServer(LoadBalancedRetryContext context) {
return this.sameServerCount < this.lbContext.getRetryHandler().getMaxRetriesOnSameServer() && this.canRetry(context);
}
public boolean canRetryNextServer(LoadBalancedRetryContext context) {
return this.nextServerCount <= this.lbContext.getRetryHandler().getMaxRetriesOnNextServer() && this.canRetry(context);
}
public void registerThrowable(LoadBalancedRetryContext context, Throwable throwable) {
if (this.lbContext.getRetryHandler().isCircuitTrippingException(throwable)) {
this.updateServerInstanceStats(context);
}
/*
notice !!!
after last same server retry,will do choose.
if current retry is call instance1, then the choose will select instance2
so next request will to instance1, after retry, choose will select instance2
loop like this,all request will to instance1
*/
if (!this.canRetrySameServer(context) && this.canRetryNextServer(context)) {
context.setServiceInstance(this.loadBalanceChooser.choose(this.serviceId));
}
if (this.sameServerCount >= this.lbContext.getRetryHandler().getMaxRetriesOnSameServer() && this.canRetry(context)) {
this.sameServerCount = 0;
++this.nextServerCount;
if (!this.canRetryNextServer(context)) {
context.setExhaustedOnly();
}
} else {
++this.sameServerCount;
}
}
so why the judgment in canRetryNextServer is this.nextServerCount <= this.lbContext.getRetryHandler().getMaxRetriesOnNextServer() instead of this.nextServerCount < this.lbContext.getRetryHandler().getMaxRetriesOnNextServer()
then I find in spring-cloud-commons org.springframework.cloud.client.loadbalancer.InterceptorRetryPolicy
//after first retry,canretry depend on canRetryNextServer
public boolean canRetry(RetryContext context) {
LoadBalancedRetryContext lbContext = (LoadBalancedRetryContext)context;
if (lbContext.getRetryCount() == 0 && lbContext.getServiceInstance() == null) {
lbContext.setServiceInstance(this.serviceInstanceChooser.choose(this.serviceName));
return true;
} else {
return this.policy.canRetryNextServer(lbContext);
}
}
can like this ?
public boolean canRetry(RetryContext context) {
LoadBalancedRetryContext lbContext = (LoadBalancedRetryContext)context;
if (lbContext.getRetryCount() == 0 && lbContext.getServiceInstance() == null) {
lbContext.setServiceInstance(this.serviceInstanceChooser.choose(this.serviceName));
return true;
} else if(lbContext.getRetryCount() < lbContext.getRetryHandler().getMaxRetriesOnSameServer()) {
return true;
}else {
// if MaxAutoRetriesNextServer is 0,return false
return this.policy.canRetryNextServer(lbContext);
}
}
Comment From: OlgaMaciaszek
Please provide a minimal, complete, verifiable example that reproduces the issue (can be a separate project GH link or a test added on a branch).
Comment From: twogoods
@OlgaMaciaszek
Please provide a minimal, complete, verifiable example that reproduces the issue (can be a separate project GH link or a test added on a branch).
sample-project : https://github.com/twogoods/ribbon-retry-sample
Comment From: OlgaMaciaszek
@twogoods I am really sorry for not getting back to you earlier. We have decided to discontinue the Hoxton release train and Ribbon support and previously do critical issue maintenance only (appropriate decisions and schedules were published on our blog), so the issue will not be addressed.