Eureka 过期
如果有一些服务过期了,或者宕机了,就不会调用shutdown()方法,也不会去发送请求下线服务实例。eureka就专门实现了一套过期的策略,去下线一些过期的服务。
它的入口就是在eureka server在启动初始化的时候,registry.openForTraffic(applicationInfoManager, registryCount);。
于是就点进方法,下面为它的代码
@Override
public void openForTraffic(ApplicationInfoManager applicationInfoManager, int count) {
// Renewals happen every 30 seconds and for a minute it should be a factor of 2.
this.expectedNumberOfClientsSendingRenews = count;
updateRenewsPerMinThreshold();
logger.info("Got {} instances from neighboring DS node", count);
logger.info("Renew threshold is: {}", numberOfRenewsPerMinThreshold);
this.startupTime = System.currentTimeMillis();
if (count > 0) {
this.peerInstancesTransferEmptyOnStartup = false;
}
DataCenterInfo.Name selfName = applicationInfoManager.getInfo().getDataCenterInfo().getName();
boolean isAws = Name.Amazon == selfName;
if (isAws && serverConfig.shouldPrimeAwsReplicaConnections()) {
logger.info("Priming AWS connections for all replicas..");
primeAwsReplicas(applicationInfoManager);
}
logger.info("Changing status to UP");
applicationInfoManager.setInstanceStatus(InstanceStatus.UP);
super.postInit();
}
expectedNumberOfClientsSendingRenews为从注册表获取的注册服务实例的数量。
protected void updateRenewsPerMinThreshold() {
this.numberOfRenewsPerMinThreshold = (int) (this.expectedNumberOfClientsSendingRenews
* (60.0 / serverConfig.getExpectedClientRenewalIntervalSeconds())
* serverConfig.getRenewalPercentThreshold());
}
getExpectedClientRenewalIntervalSeconds 默认为30秒,getRenewalPercentThreshold 是检测的一个系数为85%,这个公式会算出期望的存活的数量的。如果注册的数量为20个* 2 * 0.85 =34,这个是在自我保护机制中进行对比的,判断是否进入保护机制,进入的话,则不能摘除服务,不会走下面的方法 。下面一段代码都不是太重要,当来到super.postInit();的时候,顾名思义,是要做一些初始化的事情。下面为详细代码:
protected void postInit() {
renewsLastMin.start();
if (evictionTaskRef.get() != null) {
evictionTaskRef.get().cancel();
}
evictionTaskRef.set(new EvictionTask());
evictionTimer.schedule(evictionTaskRef.get(),
serverConfig.getEvictionIntervalTimerInMs(),
serverConfig.getEvictionIntervalTimerInMs());
}
renewsLastMin.start();看上去是要开启一个线程,看名字是续租最后一分钟,代码设计的还是很巧妙的,设置出2个Bucket,一个专门存当前时间,一个存一分钟之前的时间,这个线程1分钟运行一次,这样就可以拿到一分钟之前的时间了。
private final long sampleInterval; //1分钟
public synchronized void start() {
if (!isActive) {
timer.schedule(new TimerTask() {
@Override
public void run() {
try {
// Zero out the current bucket.
//每分钟这块调度异常
//currentBucket 是用来更新当前这一分钟的次数的
//lastBucket 是保留了上一分钟的心跳次数
//timer调度任务,1分钟来一次,就将上一分钟的心跳次数设置到lastBucket中去
lastBucket.set(currentBucket.getAndSet(0));
} catch (Throwable e) {
logger.error("Cannot reset the Measured Rate", e);
}
}
}, sampleInterval, sampleInterval);
isActive = true;
}
}
evictionTaskRef.set(new EvictionTask());这里会new一个task,下面就看一下,它到底做了什么,compensation的意思是补偿,那这个compensationTimeMs就是补偿时间(毫秒)。在getCompensationTimeMs这个方法中,会用当前时间减去上一次执行的时间,在减去摘除间隔时间(60s),在判断这个时间是否大于0,如果大于0,这说明延迟了,则会得到一个补偿时间,这个时间很关键,因为后面算服务实例是否过期要下线的时候,也和它有关系。
/* visible for testing */ class EvictionTask extends TimerTask {
private final AtomicLong lastExecutionNanosRef = new AtomicLong(0l);
@Override
public void run() {
try {
long compensationTimeMs = getCompensationTimeMs();
logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs);
evict(compensationTimeMs);
} catch (Throwable e) {
logger.error("Could not run the evict task", e);
}
}
/**
* compute a compensation time defined as the actual time this task was executed since the prev iteration,
* vs the configured amount of time for execution. This is useful for cases where changes in time (due to
* clock skew or gc for example) causes the actual eviction task to execute later than the desired time
* according to the configured cycle.
*/
//比预期的时间晚
long getCompensationTimeMs() {
//先获取当前时间
long currNanos = getCurrentTimeNano();
//上一次这个EvictionTask被执行的时间。
long lastNanos = lastExecutionNanosRef.getAndSet(currNanos);
if (lastNanos == 0l) {
return 0l;
}
long elapsedMs = TimeUnit.NANOSECONDS.toMillis(currNanos - lastNanos);
long compensationTime = elapsedMs - serverConfig.getEvictionIntervalTimerInMs();
return compensationTime <= 0l ? 0l : compensationTime;
}
long getCurrentTimeNano() { // for testing
return System.nanoTime();
}
}
接着就把补偿时间带到摘除的方法中,代码如下:
public void evict(long additionalLeaseMs) {
logger.debug("Running the evict task");
//是否运行主动删除故障的实例,自我保护机制有关
if (!isLeaseExpirationEnabled()) {
logger.debug("DS: lease expiration is currently disabled.");
return;
}
// We collect first all expired items, to evict them in random order. For large eviction sets,
// if we do not that, we might wipe out whole apps before self preservation kicks in. By randomizing it,
// the impact should be evenly distributed across all applications.
List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
if (leaseMap != null) {
for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
Lease<InstanceInfo> lease = leaseEntry.getValue();
//如果超过3分钟加上补偿时间,就认为故障了
if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
expiredLeases.add(lease);
}
}
}
}
// To compensate for GC pauses or drifting local time, we need to use current registry size as a base for
// triggering self-preservation. Without that we would wipe out full registry.
//20
int registrySize = (int) getLocalRegistrySize();
// 20*0.85
int registrySizeThreshold = (int) (registrySize * serverConfig.getRenewalPercentThreshold());
//3
int evictionLimit = registrySize - registrySizeThreshold;
int toEvict = Math.min(expiredLeases.size(), evictionLimit);
if (toEvict > 0) {
logger.info("Evicting {} items (expired={}, evictionLimit={})", toEvict, expiredLeases.size(), evictionLimit);
Random random = new Random(System.currentTimeMillis());
//随机挑选3个服务
for (int i = 0; i < toEvict; i++) {
// Pick a random item (Knuth shuffle algorithm)
int next = i + random.nextInt(expiredLeases.size() - i);
Collections.swap(expiredLeases, i, next);
Lease<InstanceInfo> lease = expiredLeases.get(i);
String appName = lease.getHolder().getAppName();
String id = lease.getHolder().getId();
EXPIRED.increment();
logger.warn("DS: Registry: expired lease for {}/{}", appName, id);
//对这个随机挑选出来的,这个也是shutdown方法调用的
internalCancel(appName, id, false);
}
}
}
- isLeaseExpirationEnabled()这个方法,主要就是自我保护,前面已经说过了,此处就不在介绍了。
接着会遍历拿到所有的服务实例,里面包含它的续租时间,会判断它有没有过期,过期的话就会加到要摘除的集合中。公式就是当前时间是否大约最后一次修改的时候加上续租时间和补偿时间,大于的话就是过期了。
public boolean isExpired(long additionalLeaseMs) {
//当前时间是否大于上一次心跳时间加上90s+再加上补偿时间
return (evictionTimestamp > 0 || System.currentTimeMillis() > (lastUpdateTimestamp + duration + additionalLeaseMs));
}
这块代码中,会计算出一个最大的摘除的个数,如果是20个的话,最大摘除3个。因此最后会选择这3和故障的6个选择一个最小的,进行随机摘除,然后调用下线的方法。