Given: 6 instances of org.springframework.cloud:spring-cloud-netflix-eureka-server:2.2.4.RELEASE (each 8 GB RAM, 4 CPU)

eureka.server.peer-node-connect-timeout-ms=20000
eureka.server.peer-node-read-timeout-ms=20000

When: 7000+ instance

Then: Eureka stucks during sync with other eureka nodes. Busy threads graphic reaches its peak. Eureka's CPU usage is 90+% and clients got Timeout exceptions on connect. And it stucks forever.

Exception: eureka.cluster.ReplicationTaskProcessor It seems to be a socket read timeout exception, it will retry later. if it continues to happen and some eureka node occupied all the cpu time, you should set property 'eureka.server.peer-node-read-timeout-ms' to a bigger value com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out

Analysis: https://github.com/spring-cloud/spring-cloud-netflix/blob/v2.2.4.RELEASE/spring-cloud-netflix-eureka-server/src/main/java/org/springframework/cloud/netflix/eureka/server/InstanceRegistry.java

    @Override
    public boolean renew(final String appName, final String serverId,
            boolean isReplication) {
        log("renew " + appName + " serverId " + serverId + ", isReplication {}"
                + isReplication);
        List<Application> applications = getSortedApplications();
        for (Application input : applications) {
            if (input.getName().equals(appName)) {
                InstanceInfo instance = null;
                for (InstanceInfo info : input.getInstances()) {
                    if (info.getId().equals(serverId)) {
                        instance = info;
                        break;
                    }
                }
                publishEvent(new EurekaInstanceRenewedEvent(this, appName, serverId,
                        instance, isReplication));
                break;
            }
        }
        return super.renew(appName, serverId, isReplication);
    }

The getSortedApplications() method takes very long time to execute. Our temporary solution: the getSortedApplications method is not executed and the EurekaInstanceRenewedEvent event is not issued.

    @Override
    public boolean renew(final String appName, final String serverId,
            boolean isReplication) {
        log("renew " + appName + " serverId " + serverId + ", isReplication {}"
                + isReplication);
        return super.renew(appName, serverId, isReplication);
    }

Do you have a better way? Thank you!

3608

Comment From: kworkbee

Any progress?

Comment From: ashitikov-bld

@OlgaMaciaszek Hi, do you still work on this issue?

Comment From: OlgaMaciaszek

Hi, sorry - we've overlooked this issue. Will take a look now.

Comment From: OlgaMaciaszek

This has already been fixed with https://github.com/spring-cloud/spring-cloud-netflix/commit/8f9797627f9b2854fcd51731b60b08ae538fad20.