zoukankan      html  css  js  c++  java
  • hadoop 2.7.3伪分布式环境运行官方wordcount

    hadoop 2.7.3伪分布式模式运行wordcount

    基本环境:
    系统:win7
    虚机环境:virtualBox
    虚机:centos 7
    hadoop版本:2.7.3

    本次以伪分布式模式来运行wordcount。

    参考:

    1 hadoop环境

    伪分布式就是将多个hadoop组件部署在一台机器上。因此涉及到各组件的配置,以及机器信任关系。

    ### 准备一个全新的环境
    # cd /home/jungle/hadoop
    # tar -zxvf hadoop-2.7.3.tar.gz
    # mv hadoop-2.7.3 hadoop-daemon
    # cd /home/jungle/hadoop/hadoop-daemon/
    

    1.1 修改hadoop配置

    • core-site.xml
    # vi etc/hadoop/core-site.xml
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>
    
    • hdfs-site.xml
    # vi etc/hadoop/hdfs-site.xml
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    </configuration>
    

    1.2 信任关系

    # ssh-keygen -t rsa
    # cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    
    # ps
    # ssh localhost
    ### 登陆本机
    
    # ps
    ### 确认两次ps显示的终端是不同的tty,则成功了
    

    1.3 格式化hdfs

    # hadoop fs -ls /
    Found 20 items
    -rw-r--r--   1 root root          0 2016-12-30 12:26 /1
    dr-xr-xr-x   - root root      45056 2016-12-30 13:06 /bin
    dr-xr-xr-x   - root root       4096 2016-12-29 20:09 /boot
    drwxr-xr-x   - root root       3120 2017-01-06 18:31 /dev
    drwxr-xr-x   - root root       8192 2017-01-06 18:32 /etc
    # ... 是linux文件系统 
    
    # hdfs namenode -format
    17/01/06 19:29:51 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = localhost/127.0.0.1
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 2.7.3
    #...
    
    STARTUP_MSG:   java = 1.8.0_111
    ************************************************************/
    17/01/06 19:29:51 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
    17/01/06 19:29:51 INFO namenode.NameNode: createNameNode [-format]
    Formatting using clusterid: CID-ee109ab5-d5f1-4919-a1c6-5ff4de21a03f
    17/01/06 19:29:52 INFO namenode.FSNamesystem: No KeyProvider found.
    17/01/06 19:29:52 INFO namenode.FSNamesystem: fsLock is fair:true
    17/01/06 19:29:52 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
    17/01/06 19:29:52 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: The block deletion will start around 2017 Jan 06 19:29:52
    17/01/06 19:29:52 INFO util.GSet: Computing capacity for map BlocksMap
    17/01/06 19:29:52 INFO util.GSet: VM type       = 64-bit
    17/01/06 19:29:52 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
    17/01/06 19:29:52 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: defaultReplication         = 3
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: maxReplication             = 512
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: minReplication             = 1
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
    17/01/06 19:29:52 INFO namenode.FSNamesystem: fsOwner             = jungle (auth:SIMPLE)
    17/01/06 19:29:52 INFO namenode.FSNamesystem: supergroup          = supergroup
    17/01/06 19:29:52 INFO namenode.FSNamesystem: isPermissionEnabled = true
    17/01/06 19:29:52 INFO namenode.FSNamesystem: HA Enabled: false
    17/01/06 19:29:52 INFO namenode.FSNamesystem: Append Enabled: true
    17/01/06 19:29:52 INFO util.GSet: Computing capacity for map INodeMap
    17/01/06 19:29:52 INFO util.GSet: VM type       = 64-bit
    17/01/06 19:29:52 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
    17/01/06 19:29:52 INFO util.GSet: capacity      = 2^20 = 1048576 entries
    17/01/06 19:29:52 INFO namenode.FSDirectory: ACLs enabled? false
    17/01/06 19:29:52 INFO namenode.FSDirectory: XAttrs enabled? true
    17/01/06 19:29:52 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
    17/01/06 19:29:52 INFO namenode.NameNode: Caching file names occuring more than 10 times
    17/01/06 19:29:52 INFO util.GSet: Computing capacity for map cachedBlocks
    17/01/06 19:29:52 INFO util.GSet: VM type       = 64-bit
    17/01/06 19:29:52 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
    17/01/06 19:29:52 INFO util.GSet: capacity      = 2^18 = 262144 entries
    17/01/06 19:29:52 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
    17/01/06 19:29:52 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
    17/01/06 19:29:52 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
    17/01/06 19:29:52 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
    17/01/06 19:29:52 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
    17/01/06 19:29:52 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
    17/01/06 19:29:52 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
    17/01/06 19:29:53 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
    17/01/06 19:29:53 INFO util.GSet: Computing capacity for map NameNodeRetryCache
    17/01/06 19:29:53 INFO util.GSet: VM type       = 64-bit
    17/01/06 19:29:53 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
    17/01/06 19:29:53 INFO util.GSet: capacity      = 2^15 = 32768 entries
    17/01/06 19:29:53 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1788036100-127.0.0.1-1483702193052
    17/01/06 19:29:53 INFO common.Storage: Storage directory /tmp/hadoop-jungle/dfs/name has been successfully formatted.
    17/01/06 19:29:53 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-jungle/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
    17/01/06 19:29:53 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-jungle/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 353 bytes saved in 0 seconds.
    17/01/06 19:29:53 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    17/01/06 19:29:53 INFO util.ExitUtil: Exiting with status 0
    17/01/06 19:29:53 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
    ************************************************************/
    

    列出上面的日志,可以看到操作结果。其中最重要的应该就是基于linux文件系统存储的hdfs:

    # ls -l /tmp/hadoop-jungle/dfs/name/current/
    total 16
    -rw-rw-r--. 1 jungle jungle 353 Jan  6 19:29 fsimage_0000000000000000000
    -rw-rw-r--. 1 jungle jungle  62 Jan  6 19:29 fsimage_0000000000000000000.md5
    -rw-rw-r--. 1 jungle jungle   2 Jan  6 19:29 seen_txid
    -rw-rw-r--. 1 jungle jungle 201 Jan  6 19:29 VERSION
    

    1.4 安装jps

    如上篇中只安装了java。还需要安装jps等工具

    # yum install java-1.8.0-openjdk-devel
    
    #jps
    4497 Jps
    

    2 启动hadoop

    2.1 启动hdfs

    # sbin/start-dfs.sh 
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /home/jungle/hadoop/hadoop-daemon/logs/hadoop-jungle-namenode-localhost.out
    localhost: starting datanode, logging to /home/jungle/hadoop/hadoop-daemon/logs/hadoop-jungle-datanode-localhost.out
    Starting secondary namenodes [0.0.0.0]
    The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
    ECDSA key fingerprint is 6a:67:9f:8b:84:64:db:19:1a:ba:86:4f:f1:9a:1c:82.
    Are you sure you want to continue connecting (yes/no)? yes
    0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
    0.0.0.0: starting secondarynamenode, logging to /home/jungle/hadoop/hadoop-daemon/logs/hadoop-jungle-secondarynamenode-localhost.out
    
    # echo $?
    0
    
    # ls -ltr logs/ 
    total 96
    -rw-rw-r--. 1 jungle jungle     0 Jan  6 20:17 SecurityAuth-jungle.audit
    -rw-rw-r--. 1 jungle jungle   716 Jan  6 20:17 hadoop-jungle-namenode-localhost.out
    -rw-rw-r--. 1 jungle jungle   716 Jan  6 20:17 hadoop-jungle-datanode-localhost.out
    -rw-rw-r--. 1 jungle jungle 29280 Jan  6 20:17 hadoop-jungle-namenode-localhost.log
    -rw-rw-r--. 1 jungle jungle 25370 Jan  6 20:17 hadoop-jungle-datanode-localhost.log
    -rw-rw-r--. 1 jungle jungle   716 Jan  6 20:17 hadoop-jungle-secondarynamenode-localhost.out
    -rw-rw-r--. 1 jungle jungle 22386 Jan  6 20:17 hadoop-jungle-secondarynamenode-localhost.log
    
    # jps
    4977 SecondaryNameNode
    4802 DataNode
    4660 NameNode
    5095 Jps
    

    如上可以看到,已经启动了NameNode及SecondaryNameNode。以及DataNode。相应的,日志文件下也有对应的out和log文件。

    
    # ls -l /tmp/hadoop-jungle/dfs/name/current/
    total 3036
    -rw-rw-r--. 1 jungle jungle      42 Jan  6 20:18 edits_0000000000000000001-0000000000000000002
    -rw-rw-r--. 1 jungle jungle 1048576 Jan  6 20:18 edits_0000000000000000003-0000000000000000003
    -rw-rw-r--. 1 jungle jungle 1048576 Jan  8 14:56 edits_inprogress_0000000000000000004
    -rw-rw-r--. 1 jungle jungle     353 Jan  6 20:18 fsimage_0000000000000000002
    -rw-rw-r--. 1 jungle jungle      62 Jan  6 20:18 fsimage_0000000000000000002.md5
    -rw-rw-r--. 1 jungle jungle     353 Jan  8 14:56 fsimage_0000000000000000003
    -rw-rw-r--. 1 jungle jungle      62 Jan  8 14:56 fsimage_0000000000000000003.md5
    -rw-rw-r--. 1 jungle jungle       2 Jan  8 14:56 seen_txid
    -rw-rw-r--. 1 jungle jungle     201 Jan  8 14:56 VERSION
    
    ### pid 
    # ls -l /tmp/hadoop-jungle-*
    -rw-rw-r--. 1 jungle jungle 5 Jan  8 14:56 /tmp/hadoop-jungle-datanode.pid
    -rw-rw-r--. 1 jungle jungle 5 Jan  8 14:56 /tmp/hadoop-jungle-namenode.pid
    -rw-rw-r--. 1 jungle jungle 5 Jan  8 14:56 /tmp/hadoop-jungle-secondarynamenode.pid
    

    2.2 检查页面

    先关闭防火墙。

     
    # systemctl status firewalld.service
    ● firewalld.service - firewalld - dynamic firewall daemon
       Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor prest: enabled)
       Active: inactive (dead) since Sun 2017-01-08 15:12:58 CST; 8s ago
         Docs: man:firewalld(1)
      Process: 681 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (cod=exited, status=0/SUCCESS)
     Main PID: 681 (code=exited, status=0/SUCCESS)
     
    # systemctl disable firewalld.service
    Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
    Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
    

    2.3 访问NameNode

    链接:

    http://192.168.1.111:50070/

    修改环境变量。因为之前在单机模式下有另一个测试目录。将其从hadoop-local改回hadoop-daemon

    # vi ~/.bashrc
    ### export HADOOP_INSTALL=/home/jungle/hadoop/hadoop-local
    export HADOOP_INSTALL=/home/jungle/hadoop/hadoop-daemon
    
    # source ~/.bashrc
    

    操作hdfs

    # hadoop fs -ls /
    ### 输出为空,根目录下没有任何内容
    
    # hdfs dfs -mkdir /user
    # hadoop fs -ls /
    Found 1 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 15:57 /user
    
    # hdfs dfs -mkdir /user/test
    # hadoop fs -ls /user/
    Found 1 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 15:57 /user/test
    
    # hadoop fs -put ../hadoop-local/dataLocal/input/ /user/test
    
    # hadoop fs -ls /user/test
    Found 1 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:02 /user/test/input
    
    # hadoop fs -ls /user/test/input
    Found 2 items
    -rw-r--r--   1 jungle supergroup         37 2017-01-08 16:02 /user/test/input/file1.txt
    -rw-r--r--   1 jungle supergroup         70 2017-01-08 16:02 /user/test/input/file2.txt
    
    

    访问:

    3 wordcount

    # bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/test/input/ /user/test/output
    
    # bin/hadoop fs -ls /user/test/output
    Found 2 items
    -rw-r--r--   1 jungle supergroup          0 2017-01-08 16:11 /user/test/output/_SUCCESS
    -rw-r--r--   1 jungle supergroup         82 2017-01-08 16:11 /user/test/output/part-r-00000
    
    bin/hadoop fs -cat /user/test/output/part-r-00000
    I	1
    am	1
    bye	2
    great	1
    hadoop.	3
    hello	3
    is	1
    jungle.	2
    software	1
    the	1
    world.	2
    
    

    4 使用yarn

    启动yarn

    # jps
    4803 DataNode
    4979 SecondaryNameNode
    4661 NameNode
    6309 Jps
    
    # sbin/start-yarn.sh 
    starting yarn daemons
    starting resourcemanager, logging to /home/jungle/hadoop/hadoop-daemon/logs/yarn-jungle-resourcemanager-localhost.localdomain.out
    localhost: starting nodemanager, logging to /home/jungle/hadoop/hadoop-daemon/logs/yarn-jungle-nodemanager-localhost.localdomain.out
    
    # jps
    4803 DataNode
    4979 SecondaryNameNode
    6355 ResourceManager
    4661 NameNode
    6477 NodeManager
    6750 Jps
    
    # hadoop fs -ls /user/test/
    Found 2 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:02 /user/test/input
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:11 /user/test/output
    
    # bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/test/input/ /user/test/output2
    
    # hadoop fs -ls /user/test/
    Found 3 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:02 /user/test/input
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:11 /user/test/output
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:25 /user/test/output2
    
    # hadoop fs -ls /user/test/output2
    Found 2 items
    -rw-r--r--   1 jungle supergroup          0 2017-01-08 16:25 /user/test/output2/_SUCCESS
    -rw-r--r--   1 jungle supergroup         82 2017-01-08 16:25 /user/test/output2/part-r-00000
    
    # hadoop fs -cat /user/test/output2/part-r-00000
    I	1
    am	1
    bye	2
    great	1
    hadoop.	3
    hello	3
    is	1
    jungle.	2
    software	1
    the	1
    world.	2
    

    执行日志:

    17/01/08 16:25:32 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    17/01/08 16:25:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    17/01/08 16:25:32 INFO input.FileInputFormat: Total input paths to process : 2
    17/01/08 16:25:33 INFO mapreduce.JobSubmitter: number of splits:2
    17/01/08 16:25:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local247232145_0001
    17/01/08 16:25:33 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    17/01/08 16:25:33 INFO mapreduce.Job: Running job: job_local247232145_0001
    17/01/08 16:25:33 INFO mapred.LocalJobRunner: OutputCommitter set in config null
    17/01/08 16:25:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    17/01/08 16:25:33 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
    17/01/08 16:25:33 INFO mapred.LocalJobRunner: Waiting for map tasks
    17/01/08 16:25:33 INFO mapred.LocalJobRunner: Starting task: attempt_local247232145_0001_m_000000_0
    17/01/08 16:25:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    17/01/08 16:25:33 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    17/01/08 16:25:33 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/test/input/file2.txt:0+70
    17/01/08 16:25:35 INFO mapreduce.Job: Job job_local247232145_0001 running in uber mode : false
    17/01/08 16:25:35 INFO mapreduce.Job:  map 0% reduce 0%
    17/01/08 16:25:35 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    17/01/08 16:25:35 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    17/01/08 16:25:35 INFO mapred.MapTask: soft limit at 83886080
    17/01/08 16:25:35 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    17/01/08 16:25:35 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    17/01/08 16:25:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    17/01/08 16:25:37 INFO mapred.LocalJobRunner: 
    17/01/08 16:25:37 INFO mapred.MapTask: Starting flush of map output
    17/01/08 16:25:37 INFO mapred.MapTask: Spilling map output
    17/01/08 16:25:37 INFO mapred.MapTask: bufstart = 0; bufend = 114; bufvoid = 104857600
    17/01/08 16:25:37 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214356(104857424); length = 41/6553600
    17/01/08 16:25:38 INFO mapred.MapTask: Finished spill 0
    17/01/08 16:25:38 INFO mapred.Task: Task:attempt_local247232145_0001_m_000000_0 is done. And is in the process of committing
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: map
    17/01/08 16:25:38 INFO mapred.Task: Task 'attempt_local247232145_0001_m_000000_0' done.
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local247232145_0001_m_000000_0
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: Starting task: attempt_local247232145_0001_m_000001_0
    17/01/08 16:25:38 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    17/01/08 16:25:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    17/01/08 16:25:38 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/test/input/file1.txt:0+37
    17/01/08 16:25:38 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    17/01/08 16:25:38 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    17/01/08 16:25:38 INFO mapred.MapTask: soft limit at 83886080
    17/01/08 16:25:38 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    17/01/08 16:25:38 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    17/01/08 16:25:38 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: 
    17/01/08 16:25:38 INFO mapred.MapTask: Starting flush of map output
    17/01/08 16:25:38 INFO mapred.MapTask: Spilling map output
    17/01/08 16:25:38 INFO mapred.MapTask: bufstart = 0; bufend = 65; bufvoid = 104857600
    17/01/08 16:25:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
    17/01/08 16:25:38 INFO mapred.MapTask: Finished spill 0
    17/01/08 16:25:38 INFO mapred.Task: Task:attempt_local247232145_0001_m_000001_0 is done. And is in the process of committing
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: map
    17/01/08 16:25:38 INFO mapred.Task: Task 'attempt_local247232145_0001_m_000001_0' done.
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local247232145_0001_m_000001_0
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: map task executor complete.
    17/01/08 16:25:38 INFO mapreduce.Job:  map 100% reduce 0%
    17/01/08 16:25:39 INFO mapred.LocalJobRunner: Waiting for reduce tasks
    17/01/08 16:25:39 INFO mapred.LocalJobRunner: Starting task: attempt_local247232145_0001_r_000000_0
    17/01/08 16:25:39 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    17/01/08 16:25:39 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    17/01/08 16:25:39 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6bef8a0a
    17/01/08 16:25:39 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
    17/01/08 16:25:39 INFO reduce.EventFetcher: attempt_local247232145_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
    17/01/08 16:25:39 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local247232145_0001_m_000000_0 decomp: 98 len: 102 to MEMORY
    17/01/08 16:25:39 INFO reduce.InMemoryMapOutput: Read 98 bytes from map-output for attempt_local247232145_0001_m_000000_0
    17/01/08 16:25:39 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 98, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->98
    17/01/08 16:25:39 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local247232145_0001_m_000001_0 decomp: 68 len: 72 to MEMORY
    17/01/08 16:25:39 INFO reduce.InMemoryMapOutput: Read 68 bytes from map-output for attempt_local247232145_0001_m_000001_0
    17/01/08 16:25:39 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 68, inMemoryMapOutputs.size() -> 2, commitMemory -> 98, usedMemory ->166
    17/01/08 16:25:39 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
    17/01/08 16:25:39 INFO mapred.LocalJobRunner: 2 / 2 copied.
    17/01/08 16:25:39 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
    17/01/08 16:25:40 WARN io.ReadaheadPool: Failed readahead on ifile
    EBADF: Bad file descriptor
    	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
    	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
    	at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
    	at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    17/01/08 16:25:40 INFO mapred.Merger: Merging 2 sorted segments
    17/01/08 16:25:40 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 156 bytes
    17/01/08 16:25:40 INFO reduce.MergeManagerImpl: Merged 2 segments, 166 bytes to disk to satisfy reduce memory limit
    17/01/08 16:25:40 INFO reduce.MergeManagerImpl: Merging 1 files, 168 bytes from disk
    17/01/08 16:25:40 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
    17/01/08 16:25:40 INFO mapred.Merger: Merging 1 sorted segments
    17/01/08 16:25:40 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 160 bytes
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: 2 / 2 copied.
    17/01/08 16:25:40 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    17/01/08 16:25:40 INFO mapred.Task: Task:attempt_local247232145_0001_r_000000_0 is done. And is in the process of committing
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: 2 / 2 copied.
    17/01/08 16:25:40 INFO mapred.Task: Task attempt_local247232145_0001_r_000000_0 is allowed to commit now
    17/01/08 16:25:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local247232145_0001_r_000000_0' to hdfs://localhost:9000/user/test/output2/_temporary/0/task_local247232145_0001_r_000000
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: reduce > reduce
    17/01/08 16:25:40 INFO mapred.Task: Task 'attempt_local247232145_0001_r_000000_0' done.
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local247232145_0001_r_000000_0
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: reduce task executor complete.
    17/01/08 16:25:40 INFO mapreduce.Job:  map 100% reduce 100%
    17/01/08 16:25:40 INFO mapreduce.Job: Job job_local247232145_0001 completed successfully
    17/01/08 16:25:41 INFO mapreduce.Job: Counters: 35
    	File System Counters
    		FILE: Number of bytes read=889201
    		FILE: Number of bytes written=1745401
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=284
    		HDFS: Number of bytes written=82
    		HDFS: Number of read operations=22
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=5
    	Map-Reduce Framework
    		Map input records=3
    		Map output records=18
    		Map output bytes=179
    		Map output materialized bytes=174
    		Input split bytes=224
    		Combine input records=18
    		Combine output records=14
    		Reduce input groups=11
    		Reduce shuffle bytes=174
    		Reduce input records=14
    		Reduce output records=11
    		Spilled Records=28
    		Shuffled Maps =2
    		Failed Shuffles=0
    		Merged Map outputs=2
    		GC time elapsed (ms)=117
    		Total committed heap usage (bytes)=457912320
    	Shuffle Errors
    		BAD_ID=0
    		CONNECTION=0
    		IO_ERROR=0
    		WRONG_LENGTH=0
    		WRONG_MAP=0
    		WRONG_REDUCE=0
    	File Input Format Counters 
    		Bytes Read=107
    	File Output Format Counters 
    		Bytes Written=82
    
    
  • 相关阅读:
    chkconfig命令
    PHP中的WebService
    MySQL 中联合查询效率分析
    javascript中json对象长度
    Replace Pioneer
    c++ 调用matlab程序
    ubuntu 安装 sublime
    一些地址收藏
    学习笔记草稿
    Redis Cluster 集群使用(3)
  • 原文地址:https://www.cnblogs.com/qinqiao/p/pseudo-distributed-hdfs-wordcount.html
Copyright © 2011-2022 走看看