zoukankan      html  css  js  c++  java
  • Apache Hive 执行HQL语句报错 ( 10G )


    # 故障描述:

    hive > select substring(request_body["uuid"], -1, 1) as uuid, count(distinct(request_body["uuid"])) as count 
    from log_bftv_api 
    where year=2017 and month=11 and day=1 and request_body["method"] = "bv.lau.urecommend" and length(request_body["uuid"]) = 25 
    group by 1 
    order by uuid;
    
    # hive 执行该HQL语句时报错信息如下:( 数据量小的时候没有问题 )

    # 报错信息:

    MapReduce Total cumulative CPU time: 1 minutes 46 seconds 70 msec
    Ended Job = job_1510050683827_0137 with errors
    Error during job, obtaining debugging information...
    Examining task ID: task_1510050683827_0137_m_000002 (and more) from job job_1510050683827_0137
    
    Task with the most failures(4): 
    -----
    Task ID:
      task_1510050683827_0137_m_000000
    
    URL:
      http://namenode:8088/taskdetails.jsp?jobid=job_1510050683827_0137&tipid=task_1510050683827_0137_m_000000
    -----
    Diagnostic Messages for this Task:
    Error: Java heap space
    
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 3  Reduce: 5   Cumulative CPU: 106.07 sec   HDFS Read: 223719539 HDFS Write: 0 FAIL
    Total MapReduce CPU Time Spent: 1 minutes 46 seconds 70 msec

    # 原因分析:

    报错显示 Error: Java heap space、return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
    
    查资料说是因为内存的原因,由于HQL实际上是被转换成mapreduce的java任务,所以做了以下操作。

    解决方法:

    hadoop shell > vim etc/hadoop/hadoop-env.sh
    
    # 默认 1000
    export HADOOP_HEAPSIZE=4096
    
    hadoop shell > vim etc/hadoop/yarn-env.sh
    
    # 默认 1000
    YARN_HEAPSIZE=4096
    
    # 跟据实际情况,按需调整!
    
    hadoop shell > vim etc/hadoop/mapred-site.xml
    
        <property>
            <name>mapreduce.map.memory.mb</name>
            <value>1536</value>
        </property>
    
        <property>
            <name>mapreduce.map.java.opts</name>
            <value>-Xmx1024M</value>
        </property>
    
        <property>
            <name>mapreduce.reduce.memory.mb</name>
            <value>3072</value>
        </property>
    
        <property>
            <name>mapreduce.reduce.java.opts</name>
            <value>-Xmx2560M</value>
        </property>
    
        <property>
            <name>mapreduce.task.io.sort.mb</name>
            <value>512</value>
        </property>
    
        <property>
            <name>mapreduce.task.io.sort.factor</name>
            <value>100</value>
        </property>
    
        <property>
            <name>mapreduce.reduce.shuffle.parallelcopies</name>
            <value>50</value>
        </property>
    
    # 新增这些参数 ( 跟据机器实际情况,按需成倍调整 )
    # 我的这个测试环境是4台8核8G的KVM虚拟机,一个NameNode,三个DataNode!

    # 经过这次参数调整,目前600G的数据集上没出过问题,HDFS 上还在不断的写入历史数据、新数据。
  • 相关阅读:
    rqnoj71 拔河比赛
    NOI2002 洛谷 P1196 银河英雄传说
    sdibt 1244 烦人的幻灯片
    POJ 1273 Drainage Ditches -dinic
    NOIP2005提高组 过河
    OpenJudge 7627 鸡蛋的硬度
    Openjudge 8782 乘积最大
    OpenJudge 7624 山区建小学
    UVa 1328 Period
    UVa 11384 Help is needed for Dexter
  • 原文地址:https://www.cnblogs.com/wangxiaoqiangs/p/7850613.html
Copyright © 2011-2022 走看看