zoukankan      html  css  js  c++  java
  • 运行impala tpch

    1.安装git和下载tpc-h-impala脚步

    [root@ip-172-31-34-31 ~]# yum install git

    [root@ip-172-31-34-31 ~]# git clone https://github.com/kj-ki/tpc-h-impala

    [root@ip-172-31-34-31 ~]# cd tpc-h-impala/

    [root@ip-172-31-34-31 tpc-h-impala]# ls
    benchmark.conf confs data README.md tpch_benchmark.sh tpch_hive tpch_impala tpch_prepare

    2.将tpch dbgen tool生成好的数据移动到指定目录
    [root@ip-172-31-34-31 data]# mv /root/tpch_2_17_0/data10g/*.tbl /root/tpc-h-impala/data

    3.调整tpc-h-impala脚本

    由于涉及到权限问题,调整tpch_prepare_data.sh脚步:将第一行改为如下:
    sudo -u hdfs /usr/bin/hadoop fs -mkdir /tpch/
    并增加一行:
    sudo -u hdfs /usr/bin/hadoop fs -chown root /tpch

    4.运行脚步tpch_prepare_data.sh,将数据从本地写到HDFS

    [root@ip-172-31-34-31 data]# ./tpch_prepare_data.sh

    5.调整tpch_benchmark.sh脚本
    由于在运行过程中会在Hive上建表,这些表要对impala可见,需要运行invalidate metadata,在运行impala查询的语句前加入以下一行

    $IMPALA_CMD -q 'invalidate metadata' 2>&1

    #!/usr/bin/env bash
    
    # set up configurations
    source benchmark.conf;
    
    if [ -e "$LOG_FILE" ]; then
            timestamp=`date "+%F-%R" --reference=$LOG_FILE`
            backupFile="$LOG_FILE.$timestamp"
            mv $LOG_FILE $LOG_DIR/$backupFile
    fi
    
    echo ""
    echo "***********************************************"
    echo "*          TPC-H benchmark on Impala          *"
    echo "***********************************************"
    echo "                                               "
    echo "See $LOG_FILE for more details of query errors."
    echo ""
    
    trial=0
    while [ $trial -lt $NUM_OF_TRIALS ]; do
            trial=`expr $trial + 1`
            echo "Executing Trial #$trial of $NUM_OF_TRIALS trial(s)..."
    
            for query in ${TPCH_QUERIES_ALL[@]}; do
                    echo "Running query: $query" | tee -a $LOG_FILE
    
                    echo "Running Hive prepare query: $query" >> $LOG_FILE
                    $TIME_CMD $HIVE_CMD -f $BASE_DIR/tpch_prepare/${query}.hive 2>&1 | tee -a $LOG_FILE | grep '^Time:'
                    returncode=${PIPESTATUS[0]}
                    if [ $returncode -ne 0 ]; then
                            echo "ABOVE QUERY FAILED:$returncode"
                    fi
    
                    # If you want to use old beta, enable below.
                    #$TIME_CMD $IMPALA_CMD -q 'refresh' 2>&1 | tee -a $LOG_FILE | grep '^Time:'
                    #returncode=${PIPESTATUS[0]}
                    #if [ $returncode -ne 0 ]; then
                    #       echo "ABOVE QUERY FAILED:$returncode"
                    #fi
    
                    echo "Running Impala query: $query" >> $LOG_FILE
                    $IMPALA_CMD -q 'invalidate metadata' 2>&1
                    $TIME_CMD $IMPALA_CMD --query_file=$BASE_DIR/tpch_impala/${query}.impala 2>&1 | tee -a $LOG_FILE | grep '^Time:'
                    returncode=${PIPESTATUS[0]}
                    if [ $returncode -ne 0 ]; then
                            echo "ABOVE QUERY FAILED:$returncode"
                    fi
    
                    #echo "Running Hive query: $query" >> $LOG_FILE
                    #$TIME_CMD $HIVE_CMD -f $BASE_DIR/tpch_hive/${query}.hive 2>&1 | tee -a $LOG_FILE | grep '^Time:'
                    #returncode=${PIPESTATUS[0]}
                    #if [ $returncode -ne 0 ]; then
                    #       echo "ABOVE QUERY FAILED:$returncode"
                    #fi
            done
    
    done # TRIAL
    echo "***********************************************"

    6.修改配置文件benchmark.conf,使指向正确的impala master:

    由于在impala-shell的集群上没有配置impala-daemon,所以需要这个修改
    # impala
    IMPALA_CMD="/usr/bin/impala-shell --impalad=172.31.25.244:21000"

    7.mr,hive,impala
    注意,要运行impala,hive必须先启动MR

    8.运行benmark脚本
    [root@ip-172-31-34-31 tpc-h-impala]# pwd
    /root/tpc-h-impala
    [root@ip-172-31-34-31 tpc-h-impala]# ./tpch_benchmark.sh

  • 相关阅读:
    英语俚语里的gotta和gonna
    如何设置Win XP远程登录如何远程控制电脑
    C#中as与is的用法(收藏)
    just用法
    even用法
    up to用法小结
    go out with用法
    realize与recognize辨析
    go through用法
    堆优先队列
  • 原文地址:https://www.cnblogs.com/littlesuccess/p/4019219.html
Copyright © 2011-2022 走看看