zoukankan      html  css  js  c++  java
  • hadoop2.2基准测试

    《hadoop the definitive way》(third version)中的Benchmarking a Hadoop Cluster Test Cases的class在新的版本中已不再试hadoop-*-test.jar, 新版本中做BanchMark Test应采用如下方法:


     

    1. TestDFSIO

    write

    TestDFSIO用来测试HDFS的I/O 性能,用一个MapReduce job来并行读取/写入文件, 每个文件在一个独立的map task里被读取或写入,而map的输出用来收集该文件被执行过程中的统计数据,

    test1 写入2个文件,每个10MB

    %yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 
    2
     -fileSize 
    10

    提交job时的consol输出:

    复制代码
    13/11/13 01:59:06 INFO fs.TestDFSIO: TestDFSIO.1.7
    13/11/13 01:59:06 INFO fs.TestDFSIO: nrFiles = 2
    13/11/13 01:59:06 INFO fs.TestDFSIO: nrBytes (MB) = 10.0
    13/11/13 01:59:06 INFO fs.TestDFSIO: bufferSize = 1000000
    13/11/13 01:59:06 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
    13/11/13 01:59:15 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 2 files
    13/11/13 01:59:26 INFO fs.TestDFSIO: created control files for: 2 files
    13/11/13 01:59:27 INFO client.RMProxy: Connecting to ResourceManager at cluster1/172.16.102.201:8032
    13/11/13 01:59:27 INFO client.RMProxy: Connecting to ResourceManager at cluster1/172.16.102.201:8032
    13/11/13 01:59:56 INFO mapred.FileInputFormat: Total input paths to process : 2
    13/11/13 02:00:21 INFO mapreduce.JobSubmitter: number of splits:2
    13/11/13 02:00:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1384321503481_0003
    13/11/13 02:00:34 INFO impl.YarnClientImpl: Submitted application application_1384321503481_0003 to ResourceManager at cluster1/172.16.102.201:8032
    13/11/13 02:00:36 INFO mapreduce.Job: The url to track the job: http://cluster1:8888/proxy/application_1384321503481_0003/
    13/11/13 02:00:36 INFO mapreduce.Job: Running job: job_1384321503481_0003
    复制代码

    从consol输出可以看到:

    (1)最终文件默认会被写入id_data文件夹下的/benchmarks/TestDFSIO文件夹下, 通过test.build.data的系统变量可以修改默认设置。

    (2)2个map task (number of splits:2), 同时也证明每一个文件的写入或读取都被单独作为一个map task

     

    job跑完后的console输出:

    复制代码
    13/11/13 02:08:15 INFO mapreduce.Job:  map 100% reduce 100%
    13/11/13 02:08:17 INFO mapreduce.Job: Job job_1384321503481_0003 completed successfully
    13/11/13 02:08:21 INFO mapreduce.Job: Counters: 43
        File System Counters
            FILE: Number of bytes read=174
            FILE: Number of bytes written=240262
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=468
            HDFS: Number of bytes written=20971595
            HDFS: Number of read operations=11
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=4
        Job Counters 
            Launched map tasks=2
            Launched reduce tasks=1
            Data-local map tasks=2
            Total time spent by all maps in occupied slots (ms)=63095
            Total time spent by all reduces in occupied slots (ms)=14813
        Map-Reduce Framework
            Map input records=2
            Map output records=10
            Map output bytes=148
            Map output materialized bytes=180
            Input split bytes=244
            Combine input records=0
            Combine output records=0
            Reduce input groups=5
            Reduce shuffle bytes=180
            Reduce input records=10
            Reduce output records=5
            Spilled Records=20
            Shuffled Maps =2
            Failed Shuffles=0
            Merged Map outputs=2
            GC time elapsed (ms)=495
            CPU time spent (ms)=3640
            Physical memory (bytes) snapshot=562757632
            Virtual memory (bytes) snapshot=2523807744
            Total committed heap usage (bytes)=421330944
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=224
        File Output Format Counters 
            Bytes Written=75
    13/11/13 02:08:23 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
    13/11/13 02:08:23 INFO fs.TestDFSIO:            Date & time: Wed Nov 13 02:08:22 PST 2013
    13/11/13 02:08:23 INFO fs.TestDFSIO:        Number of files: 2
    13/11/13 02:08:23 INFO fs.TestDFSIO: Total MBytes processed: 20.0
    13/11/13 02:08:23 INFO fs.TestDFSIO:      Throughput mb/sec: 0.5591277606933184
    13/11/13 02:08:23 INFO fs.TestDFSIO: Average IO rate mb/sec: 0.5635650753974915
    13/11/13 02:08:23 INFO fs.TestDFSIO:  IO rate std deviation: 0.05000733272172887
    13/11/13 02:08:23 INFO fs.TestDFSIO:     Test exec time sec: 534.566
    13/11/13 02:08:23 INFO fs.TestDFSIO:
    复制代码

     

    从图中可以看到map task 2, reduce task 1, 统计结果中有平均I/O速率,整体速率, job运行时间,写入文件数;

    read

    %yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -read  -nrFiles 2 -fileSize 10

    就不仔细分析了,自己试试。

    2. MapReduce Test with Sort

    hadoop提供了一个MapReduce 程序,可以测试整个MapReduce System。此基准测试分三步:

    # 产生random data

    # sort data

    # validate results

    步骤如下:

    1. 产生random data

    yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter random-data

    用RandomWriter产生random data, 在yarn上运行RandomWriter会启动一个MapReduce job, 每个node上默认启动10个map task, 每个map 会产生1GB的random data.

    修改默认参数: test.randomwriter.maps_per_host, test.randomwrite.bytes_per_map

    2. sort data

    yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar sort random-data sorted-data

     

    3.validate results

    yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar testmapredsort –sortInput randomdata –sortOutput sorted-data

    the command 会启动一个SortValidator 程序,此程序会做一些列检查例如检查unsorted和sorted data是否精确。

    3. 其他Tests

    MRBench –invoked by mrbench, 此程序会启动一个程序,运行多次

    NNBench – invoked by nnbench, namenode上的负载测试

    Gridmix  --没兴趣

     

    转自http://www.cnblogs.com/lucius/p/3421970.html

  • 相关阅读:
    关于 Delphi 中流的使用(2) 用 TFileStream(文件流) 读写
    今日工作心得:asp.net中使用javascript进行验证的注意点
    今日工作心得:能够引起FileUpload控件的PostedFile总是为null的一个原因
    完美解决office 2007安装失败(Office.ZhCN/...)
    关于FCKeditor,上传文件时提示invalid request
    今日学习:关于C#多线程之一——异步委托
    今日工作心得:关于C#读取Excel的一些资料
    c# webBrowser读取gb2312 中文变方框的问题
    C#中HashTable的用法
    今日工作心得——javascript结合jquery使图片适应窗口大小
  • 原文地址:https://www.cnblogs.com/rjf-cloud/p/3604524.html
Copyright © 2011-2022 走看看