3. prometheus远程写参数优化

zoukankan html css js c++ java

3. prometheus远程写参数优化
- 一、概述
- 二、远程写入特征
  
  2.1 整体结构
  
  2.2 重试机制
  
  2.3 内存使用
- 三、参数
  
  3.1 capacity
  
  3.2 max_shards
  
  3.3 min_shards
  
  3.4 max_samples_per_send
  
  3.5 batch_send_deadline
  
  3.6 min_backoff
  
  3.7 max_backoff
一、概述

prometheus可以通过远程存储来解决自身存储的瓶颈，所以其提供了远程存储接口，并可以通过过配置文件进行配置（prometheus.yml）。一般情况下我们使用其默认的配置参数，但是为了满足特定的应用场景需要对其进行优化，本章节介绍可通过远程写入配置使用的调整参数，如下所示：
1 # The URL of the endpoint to send samples to. 2 url: <string> 3 4 # Timeout for requests to the remote write endpoint. 5 [ remote_timeout: <duration> | default = 30s ] 6 7 # List of remote write relabel configurations. 8 write_relabel_configs: 9 [ - <relabel_config> ... ] 10 11 # Sets the `Authorization` header on every remote write request with the 12 # configured username and password. 13 # password and password_file are mutually exclusive. 14 basic_auth: 15 [ username: <string> ] 16 [ password: <string> ] 17 [ password_file: <string> ] 18 19 # Sets the `Authorization` header on every remote write request with 20 # the configured bearer token. It is mutually exclusive with `bearer_token_file`. 21 [ bearer_token: <string> ] 22 23 # Sets the `Authorization` header on every remote write request with the bearer token 24 # read from the configured file. It is mutually exclusive with `bearer_token`. 25 [ bearer_token_file: /path/to/bearer/token/file ] 26 27 # Configures the remote write request's TLS settings. 28 tls_config: 29 [ <tls_config> ] 30 31 # Optional proxy URL. 32 [ proxy_url: <string> ] 33 34 # Configures the queue used to write to remote storage. 35 queue_config: 36 # Number of samples to buffer per shard before we block reading of more 37 # samples from the WAL. It is recommended to have enough capacity in each 38 # shard to buffer several requests to keep throughput up while processing 39 # occasional slow remote requests. 40 [ capacity: <int> | default = 500 ] 41 # Maximum number of shards, i.e. amount of concurrency. 42 [ max_shards: <int> | default = 1000 ] 43 # Minimum number of shards, i.e. amount of concurrency. 44 [ min_shards: <int> | default = 1 ] 45 # Maximum number of samples per send. 46 [ max_samples_per_send: <int> | default = 100] 47 # Maximum time a sample will wait in buffer. 48 [ batch_send_deadline: <duration> | default = 5s ] 49 # Initial retry delay. Gets doubled for every retry. 50 [ min_backoff: <duration> | default = 30ms ] 51 # Maximum retry delay. 52 [ max_backoff: <duration> | default = 100ms ]

View Code
二、远程写入特征

我们本节主要探讨queue_config部分参数（其它参数比较简单，一看就知道什么意思，没有可优化的地方）。

2.1 整体结构

每个远程写入目标都会启动一个内存写队列（shards），这个队列从WAL中缓存数据（关于WAL可以参考存储部分：https://github.com/prometheus/prometheus/blob/master/docs/storage.md，原理类似于hbase中的WAL），通过队列去将指标数据写到有远程存储服务中,数据流如下所示：
1 |--> queue (shard_1) --> remote endpoint 2 WAL --|--> queue (shard_...) --> remote endpoint 3 |--> queue (shard_n) --> remote endpoint
2.2 重试机制

这需要注意的是，当一个分片备份并填满队列时，Prometheus将阻止从WAL中读取数据到任何分片。（关于这点就涉及到对以上参数优化，后面参数capacity部分讲解）

远程端点写入失败会进行重试操作，并且保证数据不会丢失，除非远程端点保持关闭状态超过2小时，因为2小时后，WAL将被压缩，尚未发送的数据将丢失。重试时间见下面参数：min_backoff和max_backoff。

2.3 内存使用

使用远程写入会增加Prometheus的内存占用量。大多数用户报告的内存使用量增加了约25％，但这取决于数据的形状。对于WAL中的每个系列，远程写代码都会缓存系列ID到标签值的映射，从而显着增加内存使用率。

除了系列缓存之外，每个分片及其队列还会增加内存使用量。分片内存与number of shards * (capacity + max_samples_per_send)成正比。当进行优化调整时，请考虑减少max_shards增加的数量，同时提高capacity和max_samples_per_send参数的大小从而避免无意间耗尽内存。默认capacity和 max_samples_per_send的取值将使得每每个shard使用内存小于100kb。

三、参数

3.1 capacity

定义：每个内存队列（shard：分片）的容量。

一旦WAL被阻塞（造成阻塞的原因请看2.1），就无法将样本附加到任何分片，并且所有吞吐量都将停止。所以在大多数情况下，单个队列容量应足够打以避免阻塞其他分片，但是太大的容量可能会导致过多的内存消耗，并导致重新分片期间清除队列的时间更长。

容量建议：将容量设置为3-10倍max_samples_per_send。

3.2 max_shards

顾名思义，最大的分片数（即队列数），也可以理解为远程写的并行度。peometheus远程写的时候会使用所有的分片，只有在写队列落后于远程写的速度，使用的队列数会达到max_shards,目的在于提高远程写的吞吐量。

PS：在操作过程中，Prometheus将根据传入的采样率，未发送的未处理样本数以及发送每个样本所花费的时间，连续计算要使用的最佳分片数。（实际的分片数是动态调整的）

3.3 min_shards

最小分片配置Prometheus使用的最小分片数量，并且是远程写入开始时使用的分片数量。如果远程写入落后，Prometheus将自动扩大分片的数量，因此大多数用户不必调整此参数。但是，增加最小分片数将使Prometheus在计算所需分片数时避免在一开始就落后。

3.4 max_samples_per_send

定义：每次远程写发送的最大指标数量，即批处理；

这个值依赖于远程存储系统，对于一些系统而言，在没有显著增加延迟的情况下发送更多指标数据而运行良好，然而，对于另外一些系统而言，每次请求中发送大量指标数据可能导致其出现故障，使用的默认值是适用于绝大多数系统的。

3.5 batch_send_deadline

定义：单一分片批量发送指标数据的最大等待时间；

即使排队的分片尚未达到max_samples_per_send，也会发送请求。对于对延迟不敏感的小批量系统，可以增加批量发送的截止时间，以提高请求效率。

3.6 min_backoff

定义：远程写失败的最小等待时间；

min_backoff是第一次的重试等待时间，第二次等待时间是其2倍，以此类推，直到max_backoff的值；

3.7 max_backoff

定义：远程写失败的最大等待时间；

参考文档：https://prometheus.io/docs/practices/remote_write/
查看全文

相关阅读:
Android Studio如何设置代码自动提示
 Java中Map的用法详解
 Android 管理Activity中的fragments
Android
WebApp之Meta标签
 iOS中为网站添加图标到主屏幕以及增加启动画面
 HTML5添加 video 视频标签后仍然无法播放的解决方法 IIS添加MIEI类型
 WebApp之 apple-touch-icon
Eclipse编辑器基本设置
 Redis监控方案

原文地址：https://www.cnblogs.com/chenmingming0225/p/12618270.html

3. prometheus远程写参数优化

一、概述

2.1 整体结构

2.2 重试机制

2.3 内存使用

三、参数

3.1 capacity

3.2 max_shards

3.3 min_shards

3.4 max_samples_per_send

3.5 batch_send_deadline

3.6 min_backoff

3.7 max_backoff