zoukankan      html  css  js  c++  java
  • Impala队列内存参数分析

    同步发布在csdn上

    问题

    对Impala队列内存的几个参数分析了下,欢迎指正

    队列资源池的几个内存配置

    1. Maximum Query Memory Limit

      某个队列资源池,一个查询在一个Impala节点上下执行需要的最小内存

    2. Minimum Query Memory Limit

      某个队列资源池,一个查询在一个Impala节点上下执行需要的最大内存

    3. 最大内存

      可用于此池中执行的所有查询的最大内存

    给一个Impala队列提交查询时,Impala如何判断是否接受查询请求

    实验sql

     set request_pool = hqueue;
     select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5;
    

    查询sql分析

    [ip:21000] testdb> explain select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5;
    Query: explain select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5
    +------------------------------------------------------------------------------------+
    | Explain String                                                                     |
    +------------------------------------------------------------------------------------+
    | Max Per-Host Resource Reservation: Memory=8.00MB Threads=3                         |
    | Per-Host Resource Estimates: Memory=256MB                                          |
    | WARNING: The following tables are missing relevant table and/or column statistics. |
    | testdb.testtable                                              |
    |                                                                                    |
    | PLAN-ROOT SINK                                                                     |
    | |                                                                                  |
    | 02:MERGING-EXCHANGE [UNPARTITIONED]                                                |
    | |  offset: 5                                                                       |
    | |  order by: acctset_code ASC                                                      |
    | |  limit: 5                                                                        |
    | |                                                                                  |
    | 01:TOP-N [LIMIT=10]                                                                |
    | |  order by: acctset_code ASC                                                      |
    | |  row-size=22B cardinality=10                                                     |
    | |                                                                                  |
    | 00:SCAN HDFS [testdb.testtable]                               |
    |    partitions=138/138 files=140 size=808.62MB                                      |
    |    predicates: acctset_code = '00001'                                              |
    |    row-size=22B cardinality=16                                                     |
    +------------------------------------------------------------------------------------+
    

    注意的地方:这里面有个单节点需要内存值256M,不过Impala估算的不一定准确。

    实验1

    Left-Aligned Left-Aligned Left-Aligned Left-Aligned
    队列名称 最大内存 Minimum Query Memory Limit Maximum Query Memory Limit
    root.hqueue 500M 260M 270M

    提交结果:

    [ip:21000] testdb> select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5;
    Query: select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5
    Query submitted at: 2020-06-23 10:54:55 (Coordinator: http://ip:25000)
    Query progress can be monitored at: http://ip:25000/query_plan?query_id=f54d764cf100d474:a89eec5c00000000
    ERROR: Rejected query from pool root.hqueue: request memory needed 780.00 MB is greater than pool max mem resources 500.00 MB.
    

    猜测是因为:260M(查询最小内存) * 3 =780M > 500M

    实验2

    Left-Aligned Left-Aligned Left-Aligned Left-Aligned
    队列名称 最大内存 Minimum Query Memory Limit Maxmum Query Memory Limit
    root.hqueue 500M 250M 270M

    提交结果:

    [ip:21000] testdb> select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5;
    Query: select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5
    Query submitted at: 2020-06-23 10:58:28 (Coordinator: http://ip:25000)
    Query progress can be monitored at: http://ip:25000/query_plan?query_id=39423b17b20dc603:66c4de7400000000
    ERROR: Rejected query from pool root.hqueue: request memory needed 768.23 MB is greater than pool max mem resources 500.00 MB.
    

    猜测是因为:256M(查询计划里面估计的单节点内存) * 3 = 768M > 500M,综合实验1和实验2,估计Impala在判断查询是否会超内存时,对估计值和Minimum Query Memory Limit参数,会有个 Max(估计值,Minimum Query Memory Limit)操作,在实验1中,即Max(256M,260M),实验2中,即Max(256,250)。

    实验3

    Left-Aligned Left-Aligned Left-Aligned Left-Aligned
    队列名称 最大内存 Minimum Query Memory Limit Maxmum Query Memory Limit
    root.hqueue 500M 250M 252M

    提交结果:

    [ip:21000] testdb> select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5;
    Query: select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5
    Query submitted at: 2020-06-23 11:09:42 (Coordinator: http://ip:25000)
    Query progress can be monitored at: http://ip:25000/query_plan?query_id=e24e74d387c201b5:9e72143600000000
    ERROR: Rejected query from pool root.hqueue: request memory needed 756.00 MB is greater than pool max mem resources 500.00 MB
    

    猜测是因为:252M * 3 = 756M > 500M,结合实验2,估计Impala在判断查询是否会超内存时,对于Maxmum Query Memory Limit参数,会有个Min操作,即Min(Max(估计值,Minimum Query Memory Limit),Maxmum Query Memory Limit),在本例中,即Min(Max(256M,250M),252M)

    实验4

    mem_limit:指定查询每个节点需要的内存

    Left-Aligned Left-Aligned Left-Aligned Left-Aligned
    队列名称 最大内存 Minimum Query Memory Limit Maxmum Query Memory Limit
    root.hqueue 500M 100M 200M
    [ip:21000] testdb> set mem_limit=170M;
    MEM_LIMIT set to 170M
    [ip:21000] testdb> select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5;
    Query: select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5
    Query submitted at: 2020-06-23 13:53:31 (Coordinator: http://ip:25000)
    Query progress can be monitored at: http://ip:25000/query_plan?query_id=ba4fa4a44d2dac9d:b24a60d600000000
    ERROR: Rejected query from pool root.hqueue: request memory needed 510.00 MB is greater than pool max mem resources 500.00 MB.
    
    [ip:21000] testdb> set mem_limit=210M;
    MEM_LIMIT set to 210M
    [ip:21000] testdb> select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5;
    Query: select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5
    Query submitted at: 2020-06-23 13:54:07 (Coordinator: http://ip:25000)
    Query progress can be monitored at: http://ip:25000/query_plan?query_id=ca49acba3c002727:2d69557a00000000
    ERROR: Rejected query from pool root.hqueue: request memory needed 600.00 MB is greater than pool max mem resources 500.00 MB
    

    分析:mem_limit=170M时,Min(Max(170,100),200) * 3 = 510M > 500M;mem_limit=210M时,Min(Max(210,100),200) * 3 = 600M > 500;猜测,指定mem_limit时,Impala会使用mem_limit值来代替自己估计的内存使用值,并结合Minimum Query Memory Limit和Maxmum Query Memory Limit来判断内存是否会超过最大内存,从而决定是否拒绝查询请求。

    实验5

    Left-Aligned Left-Aligned Left-Aligned Left-Aligned
    队列名称 最大内存 Minimum Query Memory Limit Maxmum Query Memory Limit
    root.hqueue 500M 39M 39M
    [ip:21000] testdb> select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5;
    Query: select tally_id, acctset_code  from testtable where acctset_code='00001'order by acctset_code limit 5 offset 5
    Query submitted at: 2020-06-23 15:26:42 (Coordinator: http://ip:25000)
    Query progress can be monitored at: http://ip:25000/query_plan?query_id=234ca270d3731d06:9980e6fd00000000
    ERROR: Rejected query from pool root.hqueue: minimum memory reservation is greater than memory available to the query for buffer reservations. Memory reservation needed given the current plan: 8.00 MB. Adjust either the mem_limit or the pool config (max-query-mem-limit, min-query-mem-limit) for the query to allow the query memory limit to be at least 40.00 MB. Note that changing the mem_limit may also change the plan. See the query profile for more information about the per-node memory requirements.
    

    以下配置,查询成功提交并执行

    Left-Aligned Left-Aligned Left-Aligned Left-Aligned
    队列名称 最大内存 Minimum Query Memory Limit Maxmum Query Memory Limit
    root.hqueue 500M 40M 40M

    分析:max-query-mem-limit, min-query-mem-limit,不能设置的太小,测试环境中,单个节点最少需要40M

    结论

    1. 当查询指定mem_limit,以下条件拒绝提交查询,报内存不够

      __Min(Max(mem_limit,Minimum Query Memory Limit),Maxmum Query Memory Limit) * 节点数 __> 最大内存

    2. 未指定mem_limit,以下条件拒绝提交查询,报内存不够,估计值可以通过explain获得,不过Impala估计的不准

      __Min(Max(估计值,Minimum Query Memory Limit),Maxmum Query Memory Limit) * 节点数 __> 最大内存

    3. max-query-mem-limit, min-query-mem-limit,不能设置的太小,测试环境中,单个节点最少需要40M

    建议

    1. 配置Maxmum Query Memory Limit * 节点数 <= 最大内存,查询应该不会被reject
    2. 若队列资源池中没有配置Minimum Query Memory Limit和Maxmum Query Memory Limit参数,那么从之前的结论也可以看出,Impala会根据__估计值 * 节点数__ 是否大于最大内存来判断是否拒绝该查询,但因为Impala估计出的单节点内存上限值很不准确,所以这种情况,可以通过 set mem_limit = XXM,人为设置一个合理的大小,后续Impala会根据__mem_limit__ * 节点数来判断是否会超过最大内存
  • 相关阅读:
    Introduction to Machine Learning
    IEEE 802.3 Ethernet
    Introduction to Computer Networks(网络架构与七层参考模式)
    Integral类型的跨平台使用
    Aggregate类型以及值初始化
    合成的默认构造函数定义为delete的一种情况(针对C++11标准)
    版本控制工具Git
    编程实现计算器
    Linux客户/服务器程序设计范式2——并发服务器(进程池)
    Linux客户/服务器程序设计范式1——并发服务器(多进程)
  • 原文地址:https://www.cnblogs.com/darange/p/13854665.html
Copyright © 2011-2022 走看看