We use the spring-cloud-netflix-eureka-server version is 1.4..4.RELEASE. We have 300+ microservice (1500+ instance)on production environment. We have 4 eureka instances that occasionally report a Read timed out exception when the eureka instance replicates data to peer nodes. The interface is ’/eureka/peerreplication/batch‘.

Exception: eureka.cluster.ReplicationTaskProcessor:Network level connection to peer 10.54.54.54;retrying after delay. com.sun.jersey.api.client.ClientHandlerException: java net SocketTimeoutException: Read timed out

Analysis: https://github.com/Netflix/eureka/blob/v1.7.2/eureka-core/src/main/java/com/netflix/eureka/resources/PeerReplicationResource.java

@Path("batch")
    @POST
    public Response batchReplication(ReplicationList replicationList) {
        try {
            ReplicationListResponse batchResponse = new ReplicationListResponse();
            for (ReplicationInstance instanceInfo : replicationList.getReplicationList()) {
                try {
                    batchResponse.addResponse(dispatch(instanceInfo));
                } catch (Exception e) {
                    batchResponse.addResponse(new ReplicationInstanceResponse(Status.INTERNAL_SERVER_ERROR.getStatusCode(), null));
                    logger.error("{} request processing failed for batch item {}/{}",
                            instanceInfo.getAction(), instanceInfo.getAppName(), instanceInfo.getId(), e);
                }
            }
            return Response.ok(batchResponse).build();
        } catch (Throwable e) {
            logger.error("Cannot execute batch Request", e);
            return Response.status(Status.INTERNAL_SERVER_ERROR).build();
        }
    }

https://github.com/spring-cloud/spring-cloud-netflix/blob/v1.4.4.RELEASE/spring-cloud-netflix-eureka-server/src/main/java/org/springframework/cloud/netflix/eureka/server/InstanceRegistry.java

@Override
    public boolean renew(final String appName, final String serverId,
            boolean isReplication) {
        log("renew " + appName + " serverId " + serverId + ", isReplication {}"
                + isReplication);
        List<Application> applications = getSortedApplications();
        for (Application input : applications) {
            if (input.getName().equals(appName)) {
                InstanceInfo instance = null;
                for (InstanceInfo info : input.getInstances()) {
                    if (info.getId().equals(serverId)) {
                        instance = info;
                        break;
                    }
                }
                publishEvent(new EurekaInstanceRenewedEvent(this, appName, serverId,
                        instance, isReplication));
                break;
            }
        }
        return super.renew(appName, serverId, isReplication);
    }

When the number of nodes replication exceeds 200, the ’/eureka/peerreplication/batch’ interface is easily over 200ms.The getSortedApplications() method takes about 1ms to execute. Our temporary solution: when isReplication is true, the getSortedApplications method is not executed and the EurekaInstanceRenewedEvent event is issued.

@Override
    public boolean renew(final String appName, final String serverId,
            boolean isReplication) {
        log("renew " + appName + " serverId " + serverId + ", isReplication {}"
                + isReplication);
        if(!isReplication){
            List<Application> applications = getSortedApplications();
            for (Application input : applications) {
                if (input.getName().equals(appName)) {
                    InstanceInfo instance = null;
                    for (InstanceInfo info : input.getInstances()) {
                        if (info.getId().equals(serverId)) {
                            instance = info;
                            break;
                        }
                    }
                    publishEvent(new EurekaInstanceRenewedEvent(this, appName, serverId,
                            instance, isReplication));
                    break;
                }
            }
        }
        return super.renew(appName, serverId, isReplication);
    }

Do you have a better way? Thank you!

Comment From: marcingrzejszczak

Can you please check the latest 1.4.7.RELEASE version and see if the problem persists? BTW 1.4.x branch will be not supported soon so we suggest that you upgrade to the latest stable release.

Comment From: qinxiongzhou

After checking the code. The 1.4.7.RELEASE version and the v2.2.0.M1 version also has this probrem

Comment From: qinxiongzhou

@spencergibb Please help me to take a look.

Comment From: spencergibb

Closing this due to inactivity. Please re-open if there's more to discuss.