The /pause management endpoint stops the application context.
EurekaDiscoveryClientConfiguration sets the InstanceStatus to DOWN in reaction to the context stop event as follows:
this.applicationInfoManager.setInstanceStatus(InstanceStatus.DOWN);
This status change triggers an onDemandeUpdate() on the InstanceInfoReplicator to propagate the new status back to the Eureka registry. That operation starts with a call to discoveryClient.refreshInstanceInfo() followed by discoveryClient.register() (the latter is supposed to update the status on the registry).
Unfortunately, the call to refreshInstanceInfo() overrides the new status with the value returned by the HealthCheckHandler if one is provided:
void refreshInstanceInfo() {
applicationInfoManager.refreshDataCenterInfoIfRequired();
applicationInfoManager.refreshLeaseInfoIfRequired();
InstanceStatus status;
try {
status = getHealthCheckHandler().getStatus(instanceInfo.getStatus());
} catch (Exception e) {
logger.warn("Exception from healthcheckHandler.getStatus, setting status to DOWN", e);
status = InstanceStatus.DOWN;
}
if (null != status) {
applicationInfoManager.setInstanceStatus(status);
}
}
To summarize:
- /pause stops the application context
- EurekaDiscoveryClientConfiguration.stop() is invoked and changes the InstanceStatus to DOWN
- the InstanceInfoReplicator.onDemandeUpdate() method is invoked as listener of the InstanceInfo status
- the HealthCheckHandler is invoked and returns an OK status
- the OK status overrides the DOWN status set because of /pause
- when the register() method is called, the status is back to OK and nothing is updated on the registry
This happens ONLY if Eureka is configured with eureka.client.healthcheck.enabled: true.
Happens on Brixton.SR7
Comment From: ryanjbaxter
Thinking outload here.....
What if the HealthCheckHandler took into account the app has been "paused" and returned DOWN?
Comment From: brenuart
That's basically the workaround I'm busy with atm. Two approaches:
1/ Customize the HealthCheckHandler to take the application context status into account and return DOWN unconditionally if it is stopped.
2/ Add an extra health endpoint reporting the status of the application context.
The latter approach has the extra benefit of of providing a consistent view between the health endpoint and the status reported in Eureka.
Comment From: ryanjbaxter
Would this be something you are willing to submit a PR for?
Comment From: brenuart
Our approach was to modify the EurekaHealthCheckHandler such that it returns a null status if the spring application context is stopped (paused). The status replicator explicitly checks for a null response from the health check handler and simply fallback to the current InstanceStatus as reported by the status manager.
This approach seems to be the the least invasive solution:
- the context is stopped --> no actual health status can be derived (some components may be stopped and unable to safely report their health)
- the decision about the actual status to report to the registry is the responsibility of the Eureka layer. It wouldn't have been the case anymore if the HealthCheckHandler had reported an actual status. Moreover, we would open the door to (potential) different behaviour depending on whether eureka.client.healthcheck.enabled is true or false.
I can submit a PR if all this looks reasonable to you.
Comment From: ryanjbaxter
Yeah lets see the the PR so we can review it, thanks!
Comment From: eacdy
Happens in Camden.SR3.
Comment From: eacdy
Happens in Edgware.RELEASE
Comment From: brenuart
@eacdy Indeed... but things have changed since Dalston. Here is what we introduce in the context when eureka.client.healthcheck.enabled=true:
public class Gh1571EurekaHealthCheckHandler extends EurekaHealthCheckHandler implements Ordered, Lifecycle {
/**
* {@code true} when the context stop sequence has been initiated.
*/
private boolean stopping = false;
public Gh1571EurekaHealthCheckHandler(HealthAggregator healthAggregator) {
super(healthAggregator);
}
@Override
public InstanceStatus getStatus(InstanceStatus instanceStatus) {
// Return nothing if the context is being stopped so the status held by the
// InstanceInfo remains unchanged.
if( this.stopping ) {
return null;
}
else {
return super.getStatus(instanceStatus);
}
}
@Override
public int getOrder() {
return Ordered.HIGHEST_PRECEDENCE; // Before EurekaAutoServiceRegistration!
}
@Override
public void start() {
this.stopping = false;
}
@Override
public void stop() {
this.stopping = true;
}
@Override
public boolean isRunning() {
return true;
}
}
Some words about it:
- wraps the EurekaHealthCheckHandler registered by SpringCloud to return null if the context is stopped or stopping
- registered with a high order priority so the close() method is invoked early and at least before EurekaAutoServiceRegistration (must be in effect when the registration is closed and the eureka replication triggered -> health check handler is consulted at that moment)
- cannot rely on context.isRunning() --> the status is updated after all disposable beans are disposed --> unregistration has already happened
This workaround may not be very clean, but it seems to work until we find a better solution ;-)
@ryanjbaxter If that looks good, I can submit a PR to integrate the approach directly inside the existing EurekaHealthCheckHandler.
Comment From: ryanjbaxter
Please submit the PR
Comment From: rafaelrddc
Good Morning,
I am using eureka in the latest version and the problem still persists, any news?
I can submit a PL with the correction
Comment From: ryanjbaxter
I can submit a PL with the correction
Do you mean PR?
@brenuart has already submitted one.
Comment From: rafaelrddc
Do you mean PR?
That's right.
@ryanjbaxter can you tell me which version of this fix will exit?
Comment From: ryanjbaxter
See the PR. We are waiting for @brenuart to address the comments there.
Comment From: brenuart
Damn.. I completely forgot about that issue :( @ryanjbaxter what comments do you want me to address? For what I remember, we are still using the approach described here with Edgware.SR4.
Comment From: rafaelrddc
@brenuart are you using any palliative solution ???
Comment From: brenuart
Yes, the one described here above. It is added to the context with the following configuration:
/**
* Bug fix: Do not consider Health info if context is stopped
*/
@Configuration
@ConditionalOnProperty(value = "eureka.client.healthcheck.enabled", matchIfMissing = false)
static class EurekaHealthCheckHandlerConfiguration {
@Autowired(required = false)
private HealthAggregator healthAggregator = new OrderedHealthAggregator();
@Bean
public EurekaHealthCheckHandler eurekaHealthCheckHandler() {
return new Gh1571EurekaHealthCheckHandler(healthAggregator);
}
}
Comment From: spencergibb
PRs welcome to add the Gh1571EurekaHealthCheckHandler (with a more sensible name of course)
Comment From: jfougere
Any news on this issue ? Any chance to get the PR fixed ?
Comment From: spencergibb
there has been no PR
Comment From: jfougere
Yeah I saw there was one but not finished. Not sure what was missing. But the issue is still present and does not allow to use the /pause endpoint while using the healthcheck as source of status for the registry.
Would you accept a new PR ?