Describe the bug
As we know, when RefreshScopeRefreshedEvent
is published, service instances perform deregister and register actions.
refer: EurekaDiscoveryClientConfiguration
deregister and register will call ApplicationInfoManager#setInstanceStatus
to modify InstanceStatus
.
protected static class EurekaClientConfigurationRefresher
implements ApplicationListener<RefreshScopeRefreshedEvent> {
// ...
public void onApplicationEvent(RefreshScopeRefreshedEvent event) {
if (eurekaClient != null) {
eurekaClient.getApplications();
}
if (autoRegistration != null) {
// deregister instance
this.autoRegistration.stop();
// register instance
this.autoRegistration.start();
}
}
}
But, the DiscoveryClient will also be modified InstanceStatus
by schedule task.
void refreshInstanceInfo() {
applicationInfoManager.refreshDataCenterInfoIfRequired();
applicationInfoManager.refreshLeaseInfoIfRequired();
InstanceStatus status;
try {
// get instance status
status = getHealthCheckHandler().getStatus(instanceInfo.getStatus());
} catch (Exception e) {
logger.warn("Exception from healthcheckHandler.getStatus, setting status to DOWN", e);
status = InstanceStatus.DOWN;
}
if (null != status) {
// modify instance status
applicationInfoManager.setInstanceStatus(status);
}
}
So, EurekaClientConfigurationRefresher
& DiscoveryClient#refreshInstanceInfo
concurrent execution causes concurrency issues
Step1 deregister instance then modify InstanceStatus
to DOWN
by EurekaClientConfigurationRefresher
// deregister instance
this.autoRegistration.stop();
Step2 get InstanceStatus
from instanceInfo
is DOWN
by DiscoveryClient#refreshInstanceInfo()
// get instance status
status = getHealthCheckHandler().getStatus(instanceInfo.getStatus());
Step3 deregister instance then modify InstanceStatus
to UP
by EurekaClientConfigurationRefresher
// register instance
this.autoRegistration.start();
Step4 modify InstanceStatus
to DOWN
by DiscoveryClient#refreshInstanceInfo()
// modify instance status
applicationInfoManager.setInstanceStatus(status);
If the execution order is as above, the service InstanceStatus
changes to DOWN
.
Comment From: yuhuangbin
It is recommended that the operation of querying the shared variable first and then updating it be locked here, and this problem has not been reproduced after the lock is currently added. @OlgaMaciaszek Can you give a little advice?
void refreshInstanceInfo() {
applicationInfoManager.refreshDataCenterInfoIfRequired();
applicationInfoManager.refreshLeaseInfoIfRequired();
// need lock here
synchronized (applicationInfoManager) {
InstanceStatus status;
try {
status = getHealthCheckHandler().getStatus(instanceInfo.getStatus());
} catch (Exception e) {
logger.warn("Exception from healthcheckHandler.getStatus, setting status to DOWN", e);
status = InstanceStatus.DOWN;
}
if (null != status) {
applicationInfoManager.setInstanceStatus(status);
}
}
}
Comment From: OlgaMaciaszek
Hello, @yuhuangbin, thanks for reporting the issue. Will take a look next week.
Comment From: yuhuangbin
Hello, @OlgaMaciaszek , Is there a conclusion to this issue?
Comment From: OlgaMaciaszek
Looks like a bug.
Comment From: OlgaMaciaszek
This needs to be addressed within Netflix/Eureka DiscoveryClient
. Have submitted a PR with a fix: https://github.com/Netflix/eureka/pull/1566.