Hi in my test environment services are restarts frequently by mesos-marathon without unregistration(SIGKILL) so i noticed that eureka instance is in Self-preservation-mode frequently. eureka has 65 registered services ,but numberOfRenewsPerMinThreshold is 368 that seems to be incorrect. i found that numberOfRenewsPerMinThreshold decreased only via REST-API
com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#cancel
but when lease expired
com.netflix.eureka.registry.AbstractInstanceRegistry#evict(long)
numberOfRenewsPerMinThreshold does not decreases, so numberOfRenewsPerMinThreshold does not correlate with real instance count and expectedNumberOfRenewsPerMin
some greps:
REGISTERD
grep "Registered instance" eureka.log| wc -l
----
286
EXPIRED
grep "Registry: expired lease for" eureka.log| wc -l
----
213
RENEWAL
grep "Current renewal" eureka.log
----
929790926 2017-10-30 10:08:11,991 INFO [ ReplicaAwareInstanceRegistry - RenewalThresholdUpdater ] c.n.e.r.PeerAwareInstanceRegistryImpl | Current renewal threshold is : 368
Comment From: p-zalejko
Hi, I also ran into the same problem and have been investigating it for a while. Here is what I found:
The "Renews threshold" is decreased if a service is shutdown gracefully. Then, before going down it calls com.netflix.eureka.resources.InstanceResource#cancelLease endpoint. It executes PeerAwareInstanceRegistryImpl#cancel method that updates "Renews threshold" value.
If, for instance, a service was killed by SIGKILL then the "cancelLease" is not called. At some point the "evict" method gets triggered and eventually it executes
org.springframework.cloud.netflix.eureka.server.InstanceRegistry#internalCancel method. And there is an issue: it cancels the instance but does not call the com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#cancel method (which updates "Renews threshold" value).
In my case, I tested this issue in the following way: 1. For testing purposes, I updated my configuration to get services unregistered quickly:
eureka:
client:
registry-fetch-interval-seconds: 15
server:
evictionIntervalTimerInMs: 10000
renewalThresholdUpdateIntervalMs: 15000
instance:
lease-renewal-interval-in-seconds: 15
lease-expiration-duration-in-seconds: 35
registry-fetch-interval-seconds: 15
- Launched an eureka server
- Launched 3 services and registered them within the eureka server.
- Started killing one of them and restarting again.
After repeating the step 3 many times I got a big "Renews threshold" value, even greater than Renews (last min) . If the self-preservation mode is enabled it can lead to problems because if the "Renews threshold" is never decreased then even after launching back all services, all the renews sent by them might not be enough to escape from the activated "self-preservation" mode (if it has been already activated).
Comment From: holy12345
@p-zalejko I've been following this discussion, is this problem solved?
Comment From: p-zalejko
Hi, @holy12345 . I think it isn't. I tested a milestone version (2.x) and it behaves the same.
Comment From: holy12345
@p-zalejko Hi,
This will not happen if Eureka Server self-preservation is turned off. The value of Renews threshold is correct.
PeerAwareInstanceRegistryImpl#updateRenewalThreshold()
private void updateRenewalThreshold() {
try {
Applications apps = eurekaClient.getApplications();
int count = 0;
for (Application app : apps.getRegisteredApplications()) {
for (InstanceInfo instance : app.getInstances()) {
if (this.isRegisterable(instance)) {
++count;
}
}
}
synchronized (lock) {
// Update threshold only if the threshold is greater than the
// current expected threshold of if the self preservation is disabled.
if ((count * 2) > (serverConfig.getRenewalPercentThreshold() * numberOfRenewsPerMinThreshold)
|| (!this.isSelfPreservationModeEnabled())) {
this.expectedNumberOfRenewsPerMin = count * 2;
this.numberOfRenewsPerMinThreshold = (int) ((count * 2) * serverConfig.getRenewalPercentThreshold());
}
}
logger.info("Current renewal threshold is : {}", numberOfRenewsPerMinThreshold);
} catch (Throwable e) {
logger.error("Cannot update renewal threshold", e);
}
}
First of all, this scheduled task defaults to every 15 minutes. if self-preservation is off, the Renews threshold value is recalculated. if self-preservation is turned on, there is no problem as long as the first if statement is true,but i do not understand how to make this condition .apps.getRegisteredApplications() is to get all registered client information?
Do I understand it correctly? Best wishes.
Comment From: p-zalejko
Hi @holy12345, I think you are right:
If the self-preservation is turned off (server.enableSelfPreservation: false) then we do not have a problem because we do not have self-preservation ;)
But if it is enabled I would expect that thePeerAwareInstanceRegistryImpl#updateRenewalThreshold()updated the 'expectedNumberOfRenewsPerMin' periodically.
Unfortunately, in terms of the PeerAwareInstanceRegistryImpl#updateRenewalThreshold() I do not know why the Applications apps = eurekaClient.getApplications() returns an empty list of apps in my case. As a result, count is 0 so it never gets into the ifstatement and update the expectedNumberOfRenewsPerMin. It might be a different issue, that could also solve the problem we are discussing on.
Comment From: holy12345
@p-zalejko First of all thank you for your reply :) You're right, count is 0 too in my test environment.What i am trying to do next is to understand why the count is zero, maybe solve this problem, and the problem you ask is solved.If there is progress, I will inform you first, best wishes!
Comment From: holy12345
@p-zalejko Hello, is your eureka server one? I think it should not be a cluster. If it is not a cluster, the value actually obtained is an empty list. best wishes
Comment From: yolenw
i have to follow up on this one, does this mean we are not expected to turn on "self-preservation mode" for eureka server in a none-cluster mode? thanks. @holy12345
Comment From: spencergibb
Closing this due to inactivity. Please re-open if there's more to discuss.
Comment From: Jeffrey-Hassan
@spencergibb This issue seems to persist as of latest version (1.9.12).
I came across this as well and did some digging on the source, it seems to go to the difference between a graceful shutdown and a non-graceful shutdown (e.g. the EvictTask is removing the instances, rather than a call to cancel a lease).
Ultimately, the AbstractInstanceRegistry.evict method makes a call to "internalCancel" which does not modify the "expectedNumberOfClientsSendingRenews" count. Now we have the "count" of total instances from the update task, which is calculated in PeerAwareInstanceRegistryImpl.updateRenewalThreshold by cycling through all apps and then instances of each app and keeping a tally, with one less for the evicted instance, but the "expectedNumberOfClientsSendingRenews" does not account for that evicted instance.
For a graceful shutdown of a service, PeerAwareInstanceRegistryImpl.cancel seems to be called, which decrements "expectedNumberOfClientsSendingRenews" properly.
I would've expected the variable to be decremented regardless of whether it's an eviction from the timer task or a graceful shutdown. The best place seems to be before internalCancel is called in AbstractInstanceRegistry.evict. Thoughts?
Comment From: troshko111
@Jeffrey-Hassan this has been fixed in v1.9.17.
Comment From: spencergibb
See #3743