zoukankan      html  css  js  c++  java
  • 【原创】大数据基础之Benchmark(1)HiBench

    HiBench 7
    官方:https://github.com/intel-hadoop/HiBench

    一 简介

    HiBench is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilizations. It contains a set of Hadoop, Spark and streaming workloads, including Sort, WordCount, TeraSort, Sleep, SQL, PageRank, Nutch indexing, Bayes, Kmeans, NWeight and enhanced DFSIO, etc. It also contains several streaming workloads for Spark Streaming, Flink, Storm and Gearpump.

    There are totally 19 workloads in HiBench.

    Supported Hadoop/Spark/Flink/Storm/Gearpump releases:

    Hadoop: Apache Hadoop 2.x, CDH5, HDP
    Spark: Spark 1.6.x, Spark 2.0.x, Spark 2.1.x, Spark 2.2.x
    Flink: 1.0.3
    Storm: 1.0.1
    Gearpump: 0.8.1
    Kafka: 0.8.2.2

    二 spark sql测试

    1 download

    $ wget https://github.com/intel-hadoop/HiBench/archive/HiBench-7.0.tar.gz
    $ tar xvf HiBench-7.0.tar.gz
    $ cd HiBench-HiBench-7.0

    2 build

    1)build all

    $ mvn -Dspark=2.1 -Dscala=2.11 clean package

    2)build hadoopbench and sparkbench

    $ mvn -Phadoopbench -Psparkbench -Dspark=2.1 -Dscala=2.11 clean package

    3)only build spark sql

    $ mvn -Psparkbench -Dmodules -Psql -Dspark=2.1 -Dscala=2.11 clean package

    3 prepare

    $ cp conf/hadoop.conf.template conf/hadoop.conf
    $ vi conf/hadoop.conf

    $ cp conf/spark.conf.template conf/spark.conf
    $ vi conf/spark.conf

    $ vi conf/hibench.conf
    # Data scale profile. Available value is tiny, small, large, huge, gigantic and bigdata.
    # The definition of these profiles can be found in the workload's conf file i.e. conf/workloads/micro/wordcount.conf
    hibench.scale.profile bigdata

    4 run

    sql测试分为3种:scan/aggregation/join

    $ bin/workloads/sql/scan/prepare/prepare.sh
    $ bin/workloads/sql/scan/spark/run.sh

    具体配置位于conf/workloads/sql/scan.conf
    prepare之后会在hdfs的/HiBench/Scan/Input下生成测试数据,在report/scan/prepare/下生成报告
    run之后会在report/scan/spark/下生成报告,比如monitor.html,在hive的default库下可以看到测试数据表

    $ bin/workloads/sql/join/prepare/prepare.sh
    $ bin/workloads/sql/join/spark/run.sh

    $ bin/workloads/sql/aggregation/prepare/prepare.sh
    $ bin/workloads/sql/aggregation/spark/run.sh

    依此类推

    如果prepare时报错内存溢出

    尝试修改

    $ vi bin/functions/workload_functions.sh
    local CMD="${HADOOP_EXECUTABLE} --config ${HADOOP_CONF_DIR} jar $job_jar $job_name $tail_arguments"

    格式:hadoop jar <jarName> <youClassName> -D mapreduce.reduce.memory.mb=5120 -D mapreduce.reduce.java.opts=-Xmx4608m <otherArgs>

    发现不能生效,尝试增加map数量

    $ vi bin/functions/hibench_prop_env_mapping.py:
    NUM_MAPS="hibench.default.map.parallelism",

    $ vi conf/hibench.conf
    hibench.default.map.parallelism 5000

    参考:
    https://github.com/intel-hadoop/HiBench/blob/master/docs/build-hibench.md
    https://github.com/intel-hadoop/HiBench/blob/master/docs/run-sparkbench.md

  • 相关阅读:
    明明已经include_once() 但还是报错Class 'XXXXXControllerTOPData' not found
    dell U2515H 2k显示器黑屏问题,dp线问题。
    centos7.4 php5升级到php7
    thinkphp批量插入 更新sql
    查询速度慢了10倍,查询条件类型不对,字符串当做数字类型。
    margin-left:auto;margin-right:auto; 不起作用的原因
    jquery 查找元素,id,class
    php分割url,获取参数query
    阿里云服务器删除日志的方法,查看有哪些大文件
    sql优化 分字段统计查询
  • 原文地址:https://www.cnblogs.com/barneywill/p/10436299.html
Copyright © 2011-2022 走看看