Given: 6 instances of org.springframework.cloud:spring-cloud-netflix-eureka-server:2.2.4.RELEASE (each 8 GB RAM, 4 CPU)
eureka.server.peer-node-connect-timeout-ms=20000
eureka.server.peer-node-read-timeout-ms=20000
When: 7000+ instance
Then: Eureka stucks during sync with other eureka nodes. Busy threads graphic reaches its peak. Eureka's CPU usage is 90+% and clients got Timeout exceptions on connect. And it stucks forever.
Exception:
eureka.cluster.ReplicationTaskProcessor It seems to be a socket read timeout exception, it will retry later. if it continues to happen and some eureka node occupied all the cpu time, you should set property 'eureka.server.peer-node-read-timeout-ms' to a bigger value
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
Analysis: https://github.com/spring-cloud/spring-cloud-netflix/blob/v2.2.4.RELEASE/spring-cloud-netflix-eureka-server/src/main/java/org/springframework/cloud/netflix/eureka/server/InstanceRegistry.java
@Override
public boolean renew(final String appName, final String serverId,
boolean isReplication) {
log("renew " + appName + " serverId " + serverId + ", isReplication {}"
+ isReplication);
List<Application> applications = getSortedApplications();
for (Application input : applications) {
if (input.getName().equals(appName)) {
InstanceInfo instance = null;
for (InstanceInfo info : input.getInstances()) {
if (info.getId().equals(serverId)) {
instance = info;
break;
}
}
publishEvent(new EurekaInstanceRenewedEvent(this, appName, serverId,
instance, isReplication));
break;
}
}
return super.renew(appName, serverId, isReplication);
}
The getSortedApplications() method takes very long time to execute.
Our temporary solution: the getSortedApplications method is not executed and the EurekaInstanceRenewedEvent event is not issued.
@Override
public boolean renew(final String appName, final String serverId,
boolean isReplication) {
log("renew " + appName + " serverId " + serverId + ", isReplication {}"
+ isReplication);
return super.renew(appName, serverId, isReplication);
}
Do you have a better way? Thank you!
3608
Comment From: kworkbee
Any progress?
Comment From: ashitikov-bld
@OlgaMaciaszek Hi, do you still work on this issue?
Comment From: OlgaMaciaszek
Hi, sorry - we've overlooked this issue. Will take a look now.
Comment From: OlgaMaciaszek
This has already been fixed with https://github.com/spring-cloud/spring-cloud-netflix/commit/8f9797627f9b2854fcd51731b60b08ae538fad20.