Metric概述
HystrixCommands和HystrixObservableCommands执行过程中,会产生执行的数据,这些数据对于观察调用的性能表现非常有用。
命令产生数据后,Metrics会根据不同纬度进行统计,主要有一下三个纬度:一段时间内(窗口期)的累计统计数据、持续的累计统计数据、一段时间内(窗口期)的数据分布。
Metric实现
Metrics实现主要的流程如下:
1.命令在开始执行前会向开始消息流(HystrixCommandStartStream)发送开始消息(HystrixCommandExecutionStarted)。
2.如果是线程池执行,执行前会向线程池开始消息流(HystrixThreadPoolStartStream)发送开始消息(HystrixCommandExecutionStarted)。
3.如果是线程池执行,执行后会向线程池结束消息流(HystrixThreadPoolCompletionStream)发送完成消息(HystrixCommandCompletion)。
4.命令在结束执行前会向完成消息流(HystrixCommandCompletionStream)发送完成消息(HystrixCommandCompletion)。
5.不同类型的统计流,会监听开始消息流或完成消息流,根据接受到的消息内容,进行统计。
Hystrix消息类型
HystrixCommandCompletion有一下消息类型
一段时间内统计
统计流首先监听一个消息流(开始消息流或者完成消息流),统计一段时间内各个类型消息的累计数据(时间为:metrics.rollingStats.timeInMilliseconds/metrics.rollingStats.numBuckets)。然后再对累计的数据进行累加(个数为:metrics.rollingStats.numBuckets),即为最终累计数据。
RollingCommandEventCounterStream消息流监听了HystrixCommandCompletionStream消息流,并统计各种消息类型次数。
RollingCollapserEventCounterStream消息流监听了HystrixCollapserEventStream消息流,并统计各种消息类型次数。
RollingThreadPoolEventCounterStream消息流监听了HystrixThreadPoolCompletionStream消息流,并统计各种消息类型次数。
HealthCountsStream消息流监听了HystrixThreadPoolCompletionStream消息流,并统计成功次数,失败次数,失败率。
持续统计
统计流首先监听一个消息流(开始消息流或者完成消息流),统计一段时间内各个类型消息的累计数据(时间为:metrics.rollingStats.timeInMilliseconds/metrics.rollingStats.numBuckets)。然后不断的累加累计数据。
CumulativeCommandEventCounterStream监听了HystrixCommandCompletionStream消息流,并统计各种消息类型次数。
CumulativeCollapserEventCounterStream监听了HystrixCollapserEventStream消息流,并统计各种消息类型次数。
CumulativeThreadPoolEventCounterStream监听了HystrixThreadPoolCompletionStream消息流,并统计各种消息类型次数。
一段时间内分布统计
RollingDistributionStream监听一个消息流,例如HystrixCommandStartStream,然后通过RX java对一段时间内的数值进行运算操作,生成统计值放在Histogram对象中,然后重新发射,对窗口期内的Histogram对象进行运算操作,并生成统计值重新发射。
子类RollingCommandLatencyDistributionStream监听了HystrixCommandCompletionStream消息流,并且通过RX java监听窗口期内的executelatency,通过Histogram计算窗口期内延时的分布。
子类RollingCommandUserLatencyDistributionStream监听了HystrixCommandCompletionStream消息流,并且通过RX java监听窗口期内的totalLatency,通过Histogram计算窗口期内延时的分布。
子类RollingCollapserBatchSizeDistributionStream监听了HystrixCollapserEventStream消息流,并且通过RX java监听窗口期内的ADDED_TO_BATCH消息类型次数,通过Histogram计算窗口期内延时的分布。
RollingConcurrencyStream监听一个消息流,例如HystrixCommandStartStream,然后通过RX java对一段时间内的执行并发量取最大值,重新发射,对窗口期内的执行并发量取最大值,重新发射。
子类RollingCommandMaxConcurrencyStream监听了HystrixCommandStartStream,然后通过RX java对窗口期内的执行并发量取最大值。
子类RollingThreadPoolMaxConcurrencyStream监听了HystrixThreadPoolStartStream,然后通过RX java对窗口期内的执行并发量取最大值。
其他数据流
还有一些独立于消息流的数据流,对于理解系统信息也非常有帮助。
配置流HystrixConfigurationStream,通过该数据流可以定时获取hystrix最新的properties配置信息,com.netflix.hystrix.contrib.sample.stream.HystrixConfigSseServlet就是用该流来获取配置信息。
数据格式:
data: {"type":"HystrixConfig","commands":{"CreditCardCommand":{"threadPoolKey":"CreditCard","groupKey":"CreditCard","execution":{"isolationStrategy":"THREAD","threadPoolKeyOverride":null,"requestCacheEnabled":true,"requestLogEnabled":true,"timeoutEnabled":true,"fallbackEnabled":true,"timeoutInMilliseconds":3000,"semaphoreSize":10,"fallbackSemaphoreSize":10,"threadInterruptOnTimeout":true},"metrics":{"healthBucketSizeInMs":500,"percentileBucketSizeInMilliseconds":60000,"percentileBucketCount":10,"percentileEnabled":true,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"circuitBreaker":{"enabled":true,"isForcedOpen":false,"isForcedClosed":false,"requestVolumeThreshold":20,"errorPercentageThreshold":50,"sleepInMilliseconds":5000}},"GetUserAccountCommand":{"threadPoolKey":"User","groupKey":"User","execution":{"isolationStrategy":"THREAD","threadPoolKeyOverride":null,"requestCacheEnabled":true,"requestLogEnabled":true,"timeoutEnabled":true,"fallbackEnabled":true,"timeoutInMilliseconds":50,"semaphoreSize":10,"fallbackSemaphoreSize":10,"threadInterruptOnTimeout":true},"metrics":{"healthBucketSizeInMs":500,"percentileBucketSizeInMilliseconds":60000,"percentileBucketCount":10,"percentileEnabled":true,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"circuitBreaker":{"enabled":true,"isForcedOpen":false,"isForcedClosed":false,"requestVolumeThreshold":20,"errorPercentageThreshold":50,"sleepInMilliseconds":5000}},"GetOrderCommand":{"threadPoolKey":"Order","groupKey":"Order","execution":{"isolationStrategy":"THREAD","threadPoolKeyOverride":null,"requestCacheEnabled":true,"requestLogEnabled":true,"timeoutEnabled":true,"fallbackEnabled":true,"timeoutInMilliseconds":1000,"semaphoreSize":10,"fallbackSemaphoreSize":10,"threadInterruptOnTimeout":true},"metrics":{"healthBucketSizeInMs":500,"percentileBucketSizeInMilliseconds":60000,"percentileBucketCount":10,"percentileEnabled":true,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"circuitBreaker":{"enabled":true,"isForcedOpen":false,"isForcedClosed":false,"requestVolumeThreshold":20,"errorPercentageThreshold":50,"sleepInMilliseconds":5000}},"GetPaymentInformationCommand":{"threadPoolKey":"PaymentInformation","groupKey":"PaymentInformation","execution":{"isolationStrategy":"THREAD","threadPoolKeyOverride":null,"requestCacheEnabled":true,"requestLogEnabled":true,"timeoutEnabled":true,"fallbackEnabled":true,"timeoutInMilliseconds":1000,"semaphoreSize":10,"fallbackSemaphoreSize":10,"threadInterruptOnTimeout":true},"metrics":{"healthBucketSizeInMs":500,"percentileBucketSizeInMilliseconds":60000,"percentileBucketCount":10,"percentileEnabled":true,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"circuitBreaker":{"enabled":true,"isForcedOpen":false,"isForcedClosed":false,"requestVolumeThreshold":20,"errorPercentageThreshold":50,"sleepInMilliseconds":5000}}},"threadpools":{"User":{"coreSize":8,"maxQueueSize":-1,"queueRejectionThreshold":5,"keepAliveTimeInMinutes":1,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"CreditCard":{"coreSize":8,"maxQueueSize":-1,"queueRejectionThreshold":5,"keepAliveTimeInMinutes":1,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"Order":{"coreSize":8,"maxQueueSize":-1,"queueRejectionThreshold":5,"keepAliveTimeInMinutes":1,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"PaymentInformation":{"coreSize":8,"maxQueueSize":-1,"queueRejectionThreshold":5,"keepAliveTimeInMinutes":1,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10}},"collapsers":{}}
功能流HystrixUtilizationStream,通过该数据流可以获得并发量,线程池状况等信息。com.netflix.hystrix.contrib.sample.stream.HystrixUtilizationSseServlet就是用该流来获取配置信息。
数据格式:
data: {"type":"HystrixUtilization","commands":{"CreditCardCommand":{"activeCount":0},"GetUserAccountCommand":{"activeCount":0},"GetOrderCommand":{"activeCount":1},"GetPaymentInformationCommand":{"activeCount":0}},"threadpools":{"User":{"activeCount":0,"queueSize":0,"corePoolSize":8,"poolSize":2},"CreditCard":{"activeCount":0,"queueSize":0,"corePoolSize":8,"poolSize":1},"Order":{"activeCount":1,"queueSize":0,"corePoolSize":8,"poolSize":2},"PaymentInformation":{"activeCount":0,"queueSize":0,"corePoolSize":8,"poolSize":2}}}
请求数据流HystrixRequestEventsStream,通过该数据流可以获得http请求相关的信息,com.netflix.hystrix.contrib.requests.stream.HystrixRequestEventsSseServlet就是用该流来获取配置信息。
数据格式:
data: {"name":"GetOrderCommand","events":["SUCCESS"],"latencies":[111]},{"name":"GetPaymentInformationCommand","events":["SUCCESS"],"latencies":[15]},{"name":"GetUserAccountCommand","events":["TIMEOUT","FALLBACK_SUCCESS"],"latencies":[59],"cached":2},{"name":"CreditCardCommand","events":["SUCCESS"],"latencies":[957]}],[{"name":"GetUserAccountCommand","events":["SUCCESS"],"latencies":[3],"cached":2},{"name":"GetOrderCommand","events":["SUCCESS"],"latencies":[77]},{"name":"GetPaymentInformationCommand","events":["SUCCESS"],"latencies":[21]},{"name":"CreditCardCommand","events":["SUCCESS"],"latencies":[1199]}
MetricPublisher
有时,我们需要发布Hystrix中的metrics到其他地方,Hystrix提供了相应的接口(HystrixMetricsPublisherCollapser,HystrixMetricsPublisherCommand,HystrixMetricsPublisherThreadPool),实现这些接口,并在initial方法中实现发送hystrix的metrics。并且实现HystrixMetricsPublisher,来创建这些实现类。
Hystrix原理:
Hystrix运行时,HystrixMetricsPublisherFactory通过HystrixPlugins获取HystrixMetricsPublisher的实现类。并且通过该实现类来创建(HystrixMetricsPublisherCollapser,HystrixMetricsPublisherCommand,HystrixMetricsPublisherThreadPool)的实现类,并在初次创建时调用其initial方法。
coda hale 实现了将hystrix 的metrics信息输出到指定metrics监控系统中。
引入jar包:
<dependency>
<groupId>com.netflix.hystrix</groupId>
<artifactId>hystrix-codahale-metrics-publisher</artifactId>
<version>1.5.9</version>
</dependency>
创建HystrixMetricsPublisher对象并注册到HystrixPlugins:
@Bean
HystrixMetricsPublisher hystrixMetricsPublisher() {
HystrixCodaHaleMetricsPublisher publisher = new HystrixCodaHaleMetricsPublisher(metricRegistry);
HystrixPlugins.getInstance().registerMetricsPublisher(publisher);
return publisher;
}
coda hale实现源码如下:
public class HystrixCodaHaleMetricsPublisher extends HystrixMetricsPublisher {
private final String metricsRootNode;
private final MetricRegistry metricRegistry;
public HystrixCodaHaleMetricsPublisher(MetricRegistry metricRegistry) {
this(null, metricRegistry);
}
public HystrixCodaHaleMetricsPublisher(String metricsRootNode, MetricRegistry metricRegistry) {
this.metricsRootNode = metricsRootNode;
this.metricRegistry = metricRegistry;
}
@Override
public HystrixMetricsPublisherCommand getMetricsPublisherForCommand(HystrixCommandKey commandKey, HystrixCommandGroupKey commandGroupKey, HystrixCommandMetrics metrics, HystrixCircuitBreaker circuitBreaker, HystrixCommandProperties properties) {
return new HystrixCodaHaleMetricsPublisherCommand(metricsRootNode, commandKey, commandGroupKey, metrics, circuitBreaker, properties, metricRegistry);
}
@Override
public HystrixMetricsPublisherThreadPool getMetricsPublisherForThreadPool(HystrixThreadPoolKey threadPoolKey, HystrixThreadPoolMetrics metrics, HystrixThreadPoolProperties properties) {
return new HystrixCodaHaleMetricsPublisherThreadPool(metricsRootNode, threadPoolKey, metrics, properties, metricRegistry);
}
@Override
public HystrixMetricsPublisherCollapser getMetricsPublisherForCollapser(HystrixCollapserKey collapserKey, HystrixCollapserMetrics metrics, HystrixCollapserProperties properties) {
return new HystrixCodaHaleMetricsPublisherCollapser(collapserKey, metrics, properties, metricRegistry);
}
}
HystrixCodaHaleMetricsPublisher负责创建HystrixCodaHaleMetricsPublisherCommand,HystrixCodaHaleMetricsPublisherThreadPool,HystrixCodaHaleMetricsPublisherCollapser。这三个对象实现基本逻辑是在initialize方法中向metricRegistry中设置相应信息。
public void initialize() {
metricRegistry.register(createMetricName("isCircuitBreakerOpen"), new Gauge<Boolean>() {
@Override
public Boolean getValue() {
return circuitBreaker.isOpen();
}
.....
}
HystrixCodaHaleMetricsPublisherCommand向metricRegistry设置了一下metrics信息:
isCircuitBreakerOpen 是否熔断,通过熔断器获得。
currentTime 当前系统时间
countBadRequests 通过HystrixCommandMetrics获得。
统计流类实现类
Hystrix的Metrics功能模块中存储了与Hystrix运行相关的度量信息,主要有三类类型:
1)HystrixCommandMetrics:
保存hystrix命令执行的度量信息。
markCommandStart 当命令开始执行,调用该方法。
markCommandDone 命令执行完成,调用该方法。
getRollingCount 获取某一事件类型窗口期内的统计数值
getCumulativeCount 获取某一事件类型持续的统计数值
getExecutionTimePercentile 获取某一百分比的请求执行时间
getExecutionTimeMean 获取平均请求执行时间
getTotalTimePercentile,获取某一百分比的请求执行总时间
getTotalTimeMean,获取平均请求执行总时间
getRollingMaxConcurrentExecutions 获取上一个窗口期内最大的并发数
getHealthCountsStream 获取窗口期内的失败次数,总次数,失败比率
记录以下事件类型:
HystrixRollingNumberEvent.SUCCESS 命令执行成功的
FAILURE
TIMEOUT(1), SHORT_CIRCUITED(1), THREAD_POOL_REJECTED(1), SEMAPHORE_REJECTED(1), BAD_REQUEST(1), FALLBACK_SUCCESS(1), FALLBACK_FAILURE(1), FALLBACK_REJECT
2)HystrixThreadPoolMetrics 保存hystrix线程池执行的度量信息。
markThreadCompletion 当线程吃执行一个任务时调用。
markThreadExecution 当线程池完成一个任务时调用。
getRollingCount 获取某一事件类型窗口期内的统计数值。
getCumulativeCount 获取某一事件类型持续的统计数值。
HystrixCommandMetrics实现:
当调用markCommandStart方法时,实际向消息流对象HystrixCommandStartStream 写入HystrixCommandExecutionStarted消息。消息流是消息传输的中间件,其内部是一个RX java Subject。消息监听者通过订阅这些消息流来监听这些消息。如果是线程模式执行,还需要向消息流对象HystrixThreadPoolStartStream 写入HystrixCommandExecutionStarted消息。
当调用markCommandDone方法时, 实际向消息流对象HystrixCommandCompletionStream 写入HystrixCommandCompletion消息。如果是线程模式执行,还需要向消息流对象HystrixThreadPoolCompletionStream 写入HystrixCommandCompletion消息。
当调用collapserResponseFromCache方法时,实际向消息流对象HystrixCollapserEventStream写入HystrixCollapserEvent消息。消息流是消息传输的中间件,其内部是一个RX java Subject。消息监听者通过订阅这些消息流来监听这些消息。
当调用collapserBatchExecuted方法时,实际向消息流对象HystrixCollapserEventStream写入HystrixCollapserEvent消息。消息流是消息传输的中间件,其内部是一个RX java Subject。消息监听者通过订阅这些消息流来监听这些消息。
当调用getRollingCount方法时,实际从消息流对象RollingCommandEventCounterStream获取相应的信息。RollingCommandEventCounterStream消息流监听了HystrixCommandCompletionStream消息流,并且通过RX java 对各个消息类型进行一段时间内数据的统计。
当调用getCumulativeCount方法时,实际从消息流对象CumulativeCommandEventCounterStream获取相应的信息。CumulativeCommandEventCounterStream消息流监听了HystrixCommandCompletionStream消息流,并且通过RX java 对各个消息类型进行持续的数据的统计。
当调用getExecutionTimePercentile,getExecutionTimeMean方法时,实际从消息流对象RollingCommandLatencyDistributionStream获取相应的信息。RollingCommandLatencyDistributionStream消息流监听了HystrixCommandCompletionStream消息流。并且通过RX java对窗口期内的请求的executionLatency的分布进行计算。
当调用getTotalTimePercentile,getTotalTimeMean方法时,实际从消息流对象RollingCommandUserLatencyDistributionStream获取相应的信息。RollingCommandUserLatencyDistributionStream消息流监听了HystrixCommandCompletionStream消息流。并且通过RX java对窗口期内的请求的totalLatency的分布进行计算。
当调用getHealthCountsStream,实际从消息流对象HealthCountsStream获取想要信息,HealthCountsStream消息流监听了HystrixCommandCompletionStream消息流,并且通过RX java对窗口期内的请求成功,失败,超时,进行统计得出失败次数,总次数,失败比率。
当调用getRollingMaxConcurrentExecutions,实际从消息流对象RollingCommandMaxConcurrencyStream获取相应的信息。RollingCommandMaxConcurrencyStream消息流监听了HystrixCommandStartStream消息流,并且通过RX java 获取窗口期内最大的并发数。
HystrixThreadPoolMetrics实现:
当调用getRollingCount方法时,实际从消息流对象RollingThreadPoolEventCounterStream获取相应的信息。RollingThreadPoolEventCounterStream消息流监听了HystrixCommandExecutionStarted消息流,并且进行一段时间内数据的统计。
当前服务的健康状况, 包括服务调用总次数和服务调用失败次数等. 根据Metrics的计数, 熔断器从而能计算出当前服务的调用失败率, 用来和设定的阈值比较从而决定熔断器的状态切换逻辑. 因此Metrics的实现非常重要。
HystrixRollingNumber统计一定时间内的统计数值,基本思想就是分段统计,比如说要统计qps,即1秒内的请求总数。如下图所示,我们可以将1s的时间分成10段,每段100ms。在第一个100ms内,写入第一个段中进行计数,在第二个100ms内,写入第二个段中进行计数,这样如果要统计当前时间的qps,我们总是可以通过统计当前时间前1s(共10段)的计数总和值。