zoukankan      html  css  js  c++  java
  • hadoop 2.7.3伪分布式环境运行官方wordcount

    hadoop 2.7.3伪分布式模式运行wordcount

    基本环境:
    系统:win7
    虚机环境:virtualBox
    虚机:centos 7
    hadoop版本:2.7.3

    本次以伪分布式模式来运行wordcount。

    参考:

    1 hadoop环境

    伪分布式就是将多个hadoop组件部署在一台机器上。因此涉及到各组件的配置,以及机器信任关系。

    ### 准备一个全新的环境
    # cd /home/jungle/hadoop
    # tar -zxvf hadoop-2.7.3.tar.gz
    # mv hadoop-2.7.3 hadoop-daemon
    # cd /home/jungle/hadoop/hadoop-daemon/
    

    1.1 修改hadoop配置

    • core-site.xml
    # vi etc/hadoop/core-site.xml
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>
    
    • hdfs-site.xml
    # vi etc/hadoop/hdfs-site.xml
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    </configuration>
    

    1.2 信任关系

    # ssh-keygen -t rsa
    # cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    
    # ps
    # ssh localhost
    ### 登陆本机
    
    # ps
    ### 确认两次ps显示的终端是不同的tty,则成功了
    

    1.3 格式化hdfs

    # hadoop fs -ls /
    Found 20 items
    -rw-r--r--   1 root root          0 2016-12-30 12:26 /1
    dr-xr-xr-x   - root root      45056 2016-12-30 13:06 /bin
    dr-xr-xr-x   - root root       4096 2016-12-29 20:09 /boot
    drwxr-xr-x   - root root       3120 2017-01-06 18:31 /dev
    drwxr-xr-x   - root root       8192 2017-01-06 18:32 /etc
    # ... 是linux文件系统 
    
    # hdfs namenode -format
    17/01/06 19:29:51 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = localhost/127.0.0.1
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 2.7.3
    #...
    
    STARTUP_MSG:   java = 1.8.0_111
    ************************************************************/
    17/01/06 19:29:51 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
    17/01/06 19:29:51 INFO namenode.NameNode: createNameNode [-format]
    Formatting using clusterid: CID-ee109ab5-d5f1-4919-a1c6-5ff4de21a03f
    17/01/06 19:29:52 INFO namenode.FSNamesystem: No KeyProvider found.
    17/01/06 19:29:52 INFO namenode.FSNamesystem: fsLock is fair:true
    17/01/06 19:29:52 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
    17/01/06 19:29:52 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: The block deletion will start around 2017 Jan 06 19:29:52
    17/01/06 19:29:52 INFO util.GSet: Computing capacity for map BlocksMap
    17/01/06 19:29:52 INFO util.GSet: VM type       = 64-bit
    17/01/06 19:29:52 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
    17/01/06 19:29:52 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: defaultReplication         = 3
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: maxReplication             = 512
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: minReplication             = 1
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
    17/01/06 19:29:52 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
    17/01/06 19:29:52 INFO namenode.FSNamesystem: fsOwner             = jungle (auth:SIMPLE)
    17/01/06 19:29:52 INFO namenode.FSNamesystem: supergroup          = supergroup
    17/01/06 19:29:52 INFO namenode.FSNamesystem: isPermissionEnabled = true
    17/01/06 19:29:52 INFO namenode.FSNamesystem: HA Enabled: false
    17/01/06 19:29:52 INFO namenode.FSNamesystem: Append Enabled: true
    17/01/06 19:29:52 INFO util.GSet: Computing capacity for map INodeMap
    17/01/06 19:29:52 INFO util.GSet: VM type       = 64-bit
    17/01/06 19:29:52 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
    17/01/06 19:29:52 INFO util.GSet: capacity      = 2^20 = 1048576 entries
    17/01/06 19:29:52 INFO namenode.FSDirectory: ACLs enabled? false
    17/01/06 19:29:52 INFO namenode.FSDirectory: XAttrs enabled? true
    17/01/06 19:29:52 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
    17/01/06 19:29:52 INFO namenode.NameNode: Caching file names occuring more than 10 times
    17/01/06 19:29:52 INFO util.GSet: Computing capacity for map cachedBlocks
    17/01/06 19:29:52 INFO util.GSet: VM type       = 64-bit
    17/01/06 19:29:52 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
    17/01/06 19:29:52 INFO util.GSet: capacity      = 2^18 = 262144 entries
    17/01/06 19:29:52 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
    17/01/06 19:29:52 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
    17/01/06 19:29:52 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
    17/01/06 19:29:52 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
    17/01/06 19:29:52 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
    17/01/06 19:29:52 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
    17/01/06 19:29:52 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
    17/01/06 19:29:53 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
    17/01/06 19:29:53 INFO util.GSet: Computing capacity for map NameNodeRetryCache
    17/01/06 19:29:53 INFO util.GSet: VM type       = 64-bit
    17/01/06 19:29:53 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
    17/01/06 19:29:53 INFO util.GSet: capacity      = 2^15 = 32768 entries
    17/01/06 19:29:53 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1788036100-127.0.0.1-1483702193052
    17/01/06 19:29:53 INFO common.Storage: Storage directory /tmp/hadoop-jungle/dfs/name has been successfully formatted.
    17/01/06 19:29:53 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-jungle/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
    17/01/06 19:29:53 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-jungle/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 353 bytes saved in 0 seconds.
    17/01/06 19:29:53 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    17/01/06 19:29:53 INFO util.ExitUtil: Exiting with status 0
    17/01/06 19:29:53 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
    ************************************************************/
    

    列出上面的日志,可以看到操作结果。其中最重要的应该就是基于linux文件系统存储的hdfs:

    # ls -l /tmp/hadoop-jungle/dfs/name/current/
    total 16
    -rw-rw-r--. 1 jungle jungle 353 Jan  6 19:29 fsimage_0000000000000000000
    -rw-rw-r--. 1 jungle jungle  62 Jan  6 19:29 fsimage_0000000000000000000.md5
    -rw-rw-r--. 1 jungle jungle   2 Jan  6 19:29 seen_txid
    -rw-rw-r--. 1 jungle jungle 201 Jan  6 19:29 VERSION
    

    1.4 安装jps

    如上篇中只安装了java。还需要安装jps等工具

    # yum install java-1.8.0-openjdk-devel
    
    #jps
    4497 Jps
    

    2 启动hadoop

    2.1 启动hdfs

    # sbin/start-dfs.sh 
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /home/jungle/hadoop/hadoop-daemon/logs/hadoop-jungle-namenode-localhost.out
    localhost: starting datanode, logging to /home/jungle/hadoop/hadoop-daemon/logs/hadoop-jungle-datanode-localhost.out
    Starting secondary namenodes [0.0.0.0]
    The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
    ECDSA key fingerprint is 6a:67:9f:8b:84:64:db:19:1a:ba:86:4f:f1:9a:1c:82.
    Are you sure you want to continue connecting (yes/no)? yes
    0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
    0.0.0.0: starting secondarynamenode, logging to /home/jungle/hadoop/hadoop-daemon/logs/hadoop-jungle-secondarynamenode-localhost.out
    
    # echo $?
    0
    
    # ls -ltr logs/ 
    total 96
    -rw-rw-r--. 1 jungle jungle     0 Jan  6 20:17 SecurityAuth-jungle.audit
    -rw-rw-r--. 1 jungle jungle   716 Jan  6 20:17 hadoop-jungle-namenode-localhost.out
    -rw-rw-r--. 1 jungle jungle   716 Jan  6 20:17 hadoop-jungle-datanode-localhost.out
    -rw-rw-r--. 1 jungle jungle 29280 Jan  6 20:17 hadoop-jungle-namenode-localhost.log
    -rw-rw-r--. 1 jungle jungle 25370 Jan  6 20:17 hadoop-jungle-datanode-localhost.log
    -rw-rw-r--. 1 jungle jungle   716 Jan  6 20:17 hadoop-jungle-secondarynamenode-localhost.out
    -rw-rw-r--. 1 jungle jungle 22386 Jan  6 20:17 hadoop-jungle-secondarynamenode-localhost.log
    
    # jps
    4977 SecondaryNameNode
    4802 DataNode
    4660 NameNode
    5095 Jps
    

    如上可以看到,已经启动了NameNode及SecondaryNameNode。以及DataNode。相应的,日志文件下也有对应的out和log文件。

    
    # ls -l /tmp/hadoop-jungle/dfs/name/current/
    total 3036
    -rw-rw-r--. 1 jungle jungle      42 Jan  6 20:18 edits_0000000000000000001-0000000000000000002
    -rw-rw-r--. 1 jungle jungle 1048576 Jan  6 20:18 edits_0000000000000000003-0000000000000000003
    -rw-rw-r--. 1 jungle jungle 1048576 Jan  8 14:56 edits_inprogress_0000000000000000004
    -rw-rw-r--. 1 jungle jungle     353 Jan  6 20:18 fsimage_0000000000000000002
    -rw-rw-r--. 1 jungle jungle      62 Jan  6 20:18 fsimage_0000000000000000002.md5
    -rw-rw-r--. 1 jungle jungle     353 Jan  8 14:56 fsimage_0000000000000000003
    -rw-rw-r--. 1 jungle jungle      62 Jan  8 14:56 fsimage_0000000000000000003.md5
    -rw-rw-r--. 1 jungle jungle       2 Jan  8 14:56 seen_txid
    -rw-rw-r--. 1 jungle jungle     201 Jan  8 14:56 VERSION
    
    ### pid 
    # ls -l /tmp/hadoop-jungle-*
    -rw-rw-r--. 1 jungle jungle 5 Jan  8 14:56 /tmp/hadoop-jungle-datanode.pid
    -rw-rw-r--. 1 jungle jungle 5 Jan  8 14:56 /tmp/hadoop-jungle-namenode.pid
    -rw-rw-r--. 1 jungle jungle 5 Jan  8 14:56 /tmp/hadoop-jungle-secondarynamenode.pid
    

    2.2 检查页面

    先关闭防火墙。

     
    # systemctl status firewalld.service
    ● firewalld.service - firewalld - dynamic firewall daemon
       Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor prest: enabled)
       Active: inactive (dead) since Sun 2017-01-08 15:12:58 CST; 8s ago
         Docs: man:firewalld(1)
      Process: 681 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (cod=exited, status=0/SUCCESS)
     Main PID: 681 (code=exited, status=0/SUCCESS)
     
    # systemctl disable firewalld.service
    Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
    Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
    

    2.3 访问NameNode

    链接:

    http://192.168.1.111:50070/

    修改环境变量。因为之前在单机模式下有另一个测试目录。将其从hadoop-local改回hadoop-daemon

    # vi ~/.bashrc
    ### export HADOOP_INSTALL=/home/jungle/hadoop/hadoop-local
    export HADOOP_INSTALL=/home/jungle/hadoop/hadoop-daemon
    
    # source ~/.bashrc
    

    操作hdfs

    # hadoop fs -ls /
    ### 输出为空,根目录下没有任何内容
    
    # hdfs dfs -mkdir /user
    # hadoop fs -ls /
    Found 1 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 15:57 /user
    
    # hdfs dfs -mkdir /user/test
    # hadoop fs -ls /user/
    Found 1 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 15:57 /user/test
    
    # hadoop fs -put ../hadoop-local/dataLocal/input/ /user/test
    
    # hadoop fs -ls /user/test
    Found 1 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:02 /user/test/input
    
    # hadoop fs -ls /user/test/input
    Found 2 items
    -rw-r--r--   1 jungle supergroup         37 2017-01-08 16:02 /user/test/input/file1.txt
    -rw-r--r--   1 jungle supergroup         70 2017-01-08 16:02 /user/test/input/file2.txt
    
    

    访问:

    3 wordcount

    # bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/test/input/ /user/test/output
    
    # bin/hadoop fs -ls /user/test/output
    Found 2 items
    -rw-r--r--   1 jungle supergroup          0 2017-01-08 16:11 /user/test/output/_SUCCESS
    -rw-r--r--   1 jungle supergroup         82 2017-01-08 16:11 /user/test/output/part-r-00000
    
    bin/hadoop fs -cat /user/test/output/part-r-00000
    I	1
    am	1
    bye	2
    great	1
    hadoop.	3
    hello	3
    is	1
    jungle.	2
    software	1
    the	1
    world.	2
    
    

    4 使用yarn

    启动yarn

    # jps
    4803 DataNode
    4979 SecondaryNameNode
    4661 NameNode
    6309 Jps
    
    # sbin/start-yarn.sh 
    starting yarn daemons
    starting resourcemanager, logging to /home/jungle/hadoop/hadoop-daemon/logs/yarn-jungle-resourcemanager-localhost.localdomain.out
    localhost: starting nodemanager, logging to /home/jungle/hadoop/hadoop-daemon/logs/yarn-jungle-nodemanager-localhost.localdomain.out
    
    # jps
    4803 DataNode
    4979 SecondaryNameNode
    6355 ResourceManager
    4661 NameNode
    6477 NodeManager
    6750 Jps
    
    # hadoop fs -ls /user/test/
    Found 2 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:02 /user/test/input
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:11 /user/test/output
    
    # bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/test/input/ /user/test/output2
    
    # hadoop fs -ls /user/test/
    Found 3 items
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:02 /user/test/input
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:11 /user/test/output
    drwxr-xr-x   - jungle supergroup          0 2017-01-08 16:25 /user/test/output2
    
    # hadoop fs -ls /user/test/output2
    Found 2 items
    -rw-r--r--   1 jungle supergroup          0 2017-01-08 16:25 /user/test/output2/_SUCCESS
    -rw-r--r--   1 jungle supergroup         82 2017-01-08 16:25 /user/test/output2/part-r-00000
    
    # hadoop fs -cat /user/test/output2/part-r-00000
    I	1
    am	1
    bye	2
    great	1
    hadoop.	3
    hello	3
    is	1
    jungle.	2
    software	1
    the	1
    world.	2
    

    执行日志:

    17/01/08 16:25:32 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    17/01/08 16:25:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    17/01/08 16:25:32 INFO input.FileInputFormat: Total input paths to process : 2
    17/01/08 16:25:33 INFO mapreduce.JobSubmitter: number of splits:2
    17/01/08 16:25:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local247232145_0001
    17/01/08 16:25:33 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    17/01/08 16:25:33 INFO mapreduce.Job: Running job: job_local247232145_0001
    17/01/08 16:25:33 INFO mapred.LocalJobRunner: OutputCommitter set in config null
    17/01/08 16:25:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    17/01/08 16:25:33 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
    17/01/08 16:25:33 INFO mapred.LocalJobRunner: Waiting for map tasks
    17/01/08 16:25:33 INFO mapred.LocalJobRunner: Starting task: attempt_local247232145_0001_m_000000_0
    17/01/08 16:25:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    17/01/08 16:25:33 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    17/01/08 16:25:33 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/test/input/file2.txt:0+70
    17/01/08 16:25:35 INFO mapreduce.Job: Job job_local247232145_0001 running in uber mode : false
    17/01/08 16:25:35 INFO mapreduce.Job:  map 0% reduce 0%
    17/01/08 16:25:35 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    17/01/08 16:25:35 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    17/01/08 16:25:35 INFO mapred.MapTask: soft limit at 83886080
    17/01/08 16:25:35 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    17/01/08 16:25:35 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    17/01/08 16:25:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    17/01/08 16:25:37 INFO mapred.LocalJobRunner: 
    17/01/08 16:25:37 INFO mapred.MapTask: Starting flush of map output
    17/01/08 16:25:37 INFO mapred.MapTask: Spilling map output
    17/01/08 16:25:37 INFO mapred.MapTask: bufstart = 0; bufend = 114; bufvoid = 104857600
    17/01/08 16:25:37 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214356(104857424); length = 41/6553600
    17/01/08 16:25:38 INFO mapred.MapTask: Finished spill 0
    17/01/08 16:25:38 INFO mapred.Task: Task:attempt_local247232145_0001_m_000000_0 is done. And is in the process of committing
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: map
    17/01/08 16:25:38 INFO mapred.Task: Task 'attempt_local247232145_0001_m_000000_0' done.
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local247232145_0001_m_000000_0
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: Starting task: attempt_local247232145_0001_m_000001_0
    17/01/08 16:25:38 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    17/01/08 16:25:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    17/01/08 16:25:38 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/test/input/file1.txt:0+37
    17/01/08 16:25:38 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    17/01/08 16:25:38 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    17/01/08 16:25:38 INFO mapred.MapTask: soft limit at 83886080
    17/01/08 16:25:38 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    17/01/08 16:25:38 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    17/01/08 16:25:38 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: 
    17/01/08 16:25:38 INFO mapred.MapTask: Starting flush of map output
    17/01/08 16:25:38 INFO mapred.MapTask: Spilling map output
    17/01/08 16:25:38 INFO mapred.MapTask: bufstart = 0; bufend = 65; bufvoid = 104857600
    17/01/08 16:25:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
    17/01/08 16:25:38 INFO mapred.MapTask: Finished spill 0
    17/01/08 16:25:38 INFO mapred.Task: Task:attempt_local247232145_0001_m_000001_0 is done. And is in the process of committing
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: map
    17/01/08 16:25:38 INFO mapred.Task: Task 'attempt_local247232145_0001_m_000001_0' done.
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local247232145_0001_m_000001_0
    17/01/08 16:25:38 INFO mapred.LocalJobRunner: map task executor complete.
    17/01/08 16:25:38 INFO mapreduce.Job:  map 100% reduce 0%
    17/01/08 16:25:39 INFO mapred.LocalJobRunner: Waiting for reduce tasks
    17/01/08 16:25:39 INFO mapred.LocalJobRunner: Starting task: attempt_local247232145_0001_r_000000_0
    17/01/08 16:25:39 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    17/01/08 16:25:39 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    17/01/08 16:25:39 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6bef8a0a
    17/01/08 16:25:39 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
    17/01/08 16:25:39 INFO reduce.EventFetcher: attempt_local247232145_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
    17/01/08 16:25:39 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local247232145_0001_m_000000_0 decomp: 98 len: 102 to MEMORY
    17/01/08 16:25:39 INFO reduce.InMemoryMapOutput: Read 98 bytes from map-output for attempt_local247232145_0001_m_000000_0
    17/01/08 16:25:39 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 98, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->98
    17/01/08 16:25:39 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local247232145_0001_m_000001_0 decomp: 68 len: 72 to MEMORY
    17/01/08 16:25:39 INFO reduce.InMemoryMapOutput: Read 68 bytes from map-output for attempt_local247232145_0001_m_000001_0
    17/01/08 16:25:39 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 68, inMemoryMapOutputs.size() -> 2, commitMemory -> 98, usedMemory ->166
    17/01/08 16:25:39 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
    17/01/08 16:25:39 INFO mapred.LocalJobRunner: 2 / 2 copied.
    17/01/08 16:25:39 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
    17/01/08 16:25:40 WARN io.ReadaheadPool: Failed readahead on ifile
    EBADF: Bad file descriptor
    	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
    	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
    	at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
    	at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    17/01/08 16:25:40 INFO mapred.Merger: Merging 2 sorted segments
    17/01/08 16:25:40 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 156 bytes
    17/01/08 16:25:40 INFO reduce.MergeManagerImpl: Merged 2 segments, 166 bytes to disk to satisfy reduce memory limit
    17/01/08 16:25:40 INFO reduce.MergeManagerImpl: Merging 1 files, 168 bytes from disk
    17/01/08 16:25:40 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
    17/01/08 16:25:40 INFO mapred.Merger: Merging 1 sorted segments
    17/01/08 16:25:40 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 160 bytes
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: 2 / 2 copied.
    17/01/08 16:25:40 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    17/01/08 16:25:40 INFO mapred.Task: Task:attempt_local247232145_0001_r_000000_0 is done. And is in the process of committing
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: 2 / 2 copied.
    17/01/08 16:25:40 INFO mapred.Task: Task attempt_local247232145_0001_r_000000_0 is allowed to commit now
    17/01/08 16:25:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local247232145_0001_r_000000_0' to hdfs://localhost:9000/user/test/output2/_temporary/0/task_local247232145_0001_r_000000
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: reduce > reduce
    17/01/08 16:25:40 INFO mapred.Task: Task 'attempt_local247232145_0001_r_000000_0' done.
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local247232145_0001_r_000000_0
    17/01/08 16:25:40 INFO mapred.LocalJobRunner: reduce task executor complete.
    17/01/08 16:25:40 INFO mapreduce.Job:  map 100% reduce 100%
    17/01/08 16:25:40 INFO mapreduce.Job: Job job_local247232145_0001 completed successfully
    17/01/08 16:25:41 INFO mapreduce.Job: Counters: 35
    	File System Counters
    		FILE: Number of bytes read=889201
    		FILE: Number of bytes written=1745401
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=284
    		HDFS: Number of bytes written=82
    		HDFS: Number of read operations=22
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=5
    	Map-Reduce Framework
    		Map input records=3
    		Map output records=18
    		Map output bytes=179
    		Map output materialized bytes=174
    		Input split bytes=224
    		Combine input records=18
    		Combine output records=14
    		Reduce input groups=11
    		Reduce shuffle bytes=174
    		Reduce input records=14
    		Reduce output records=11
    		Spilled Records=28
    		Shuffled Maps =2
    		Failed Shuffles=0
    		Merged Map outputs=2
    		GC time elapsed (ms)=117
    		Total committed heap usage (bytes)=457912320
    	Shuffle Errors
    		BAD_ID=0
    		CONNECTION=0
    		IO_ERROR=0
    		WRONG_LENGTH=0
    		WRONG_MAP=0
    		WRONG_REDUCE=0
    	File Input Format Counters 
    		Bytes Read=107
    	File Output Format Counters 
    		Bytes Written=82
    
    
  • 相关阅读:
    Atitit sql计划任务与查询优化器统计信息模块
    Atitit  数据库的事件机制触发器与定时任务attilax总结
    Atitit 图像处理知识点体系知识图谱 路线图attilax总结 v4 qcb.xlsx
    Atitit 图像处理 深刻理解梯度原理计算.v1 qc8
    Atiti 数据库系统原理 与数据库方面的书籍 attilax总结 v3 .docx
    Atitit Mysql查询优化器 存取类型 范围存取类型 索引存取类型 AND or的分析
    Atitit View事件分发机制
    Atitit 基于sql编程语言的oo面向对象大规模应用解决方案attilax总结
    Atitti 存储引擎支持的国内点与特性attilax总结
    Atitit 深入理解软件的本质 attilax总结 软件三原则"三次原则"是DRY原则和YAGNI原则的折
  • 原文地址:https://www.cnblogs.com/qinqiao/p/pseudo-distributed-hdfs-wordcount.html
Copyright © 2011-2022 走看看