zoukankan      html  css  js  c++  java
  • cdh上的hadoopshell操作及wordcount测试

    1、CDH页面装完hadoop后,执行报错

    [root@node1 bin]# hadoop fs -ls

    /opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/bin/../lib/hadoop/bin/hadoop: line 144: /usr/java/jdk1.7.0_67-clouderaexport/bin/java: No such file or directory

    /opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/bin/../lib/hadoop/bin/hadoop: line 144: exec: /usr/java/jdk1.7.0_67-clouderaexport/bin/java: cannot execute: No such file or directory

    原因:CDH上hdfs安装完后还是需要该对应配置文件和profile的(增加HADOOP_home等),因该机器原来装过一个hadoop,hadoop_home没作更改,所以报错

    修改/etc/profile后ok,修改如下:

    #java

    export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

    #export JAVA_HOME=/usr/local/jdk1.8.0_191

    export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

    #export HADOOP_HOME=/usr/local/hadoop-2.6.0-cdh5.7.0

    export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5

    2、常见操作:

    #java

    export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

    #export JAVA_HOME=/usr/local/jdk1.8.0_191

    export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

    #export HADOOP_HOME=/usr/local/hadoop-2.6.0-cdh5.7.0

    export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5

    3、使用wordcount 统计功能

    hdfs上上传一个文件:文件内容如下:

    [root@node1 examples]# more /home/test/test1.sh

    #!/bin/bash

    #edate=$(chage -l $USER|grep "Password expires" |awk '{print $4,$5,$6,$7}')

    edate=$(chage -l test|grep "Password expires" |awk '{print $4,$5,$6,$7}')

    date3=$(date -d "+3 day"|awk '{print $2,$3,$6}')

    if [[ $edate = "never" ]]; then

      echo "never expired"

    elif [[ $date3 = $edate ]]; then

          echo "3 days"

    else

      echo "unexpired"

    fi

    上传文件:

    [root@node1 test]# hadoop fs -put /home/test/test1.sh /tmp

    查看:

    [root@node1 test]# hadoop fs -ls /tmp

    Found 2 items

    drwxrwxrwx   - hdfs supergroup          0 2019-12-24 16:35 /tmp/.cloudera_health_monitoring_canary_files

    -rw-r--r--   3 root supergroup        346 2019-12-24 16:35 /tmp/test1.sh

    [root@node1 test]#

    执行wordcount的mapreduce 需在 hdfs用户下,否则创建输出路径时报错:

    hadoop jar /opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/share/doc/hadoop-0.20-mapreduce/examples/hadoop-examples-2.6.0-mr1-cdh5.10.2.jar wordcount /tmp/test1.sh /output1

    输入日志如下:

    19/12/24 16:45:15 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    
    19/12/24 16:45:15 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    
    19/12/24 16:45:15 INFO input.FileInputFormat: Total input paths to process : 1
    
    19/12/24 16:45:15 INFO mapreduce.JobSubmitter: number of splits:1
    
    19/12/24 16:45:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local704420616_0001
    
    19/12/24 16:45:16 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    
    19/12/24 16:45:16 INFO mapreduce.Job: Running job: job_local704420616_0001
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: OutputCommitter set in config null
    
    19/12/24 16:45:16 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: Waiting for map tasks
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: Starting task: attempt_local704420616_0001_m_000000_0
    
    19/12/24 16:45:16 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    
    19/12/24 16:45:16 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    
    19/12/24 16:45:16 INFO mapred.MapTask: Processing split: hdfs://node1:8020/tmp/test1.sh:0+346
    
    19/12/24 16:45:16 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    
    19/12/24 16:45:16 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    
    19/12/24 16:45:16 INFO mapred.MapTask: soft limit at 83886080
    
    19/12/24 16:45:16 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    
    19/12/24 16:45:16 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    
    19/12/24 16:45:16 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner:
    
    19/12/24 16:45:16 INFO mapred.MapTask: Starting flush of map output
    
    19/12/24 16:45:16 INFO mapred.MapTask: Spilling map output
    
    19/12/24 16:45:16 INFO mapred.MapTask: bufstart = 0; bufend = 524; bufvoid = 104857600
    
    19/12/24 16:45:16 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214212(104856848); length = 185/6553600
    
    19/12/24 16:45:16 INFO mapred.MapTask: Finished spill 0
    
    19/12/24 16:45:16 INFO mapred.Task: Task:attempt_local704420616_0001_m_000000_0 is done. And is in the process of committing
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: map
    
    19/12/24 16:45:16 INFO mapred.Task: Task 'attempt_local704420616_0001_m_000000_0' done.
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: Finishing task: attempt_local704420616_0001_m_000000_0
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: map task executor complete.
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: Waiting for reduce tasks
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: Starting task: attempt_local704420616_0001_r_000000_0
    
    19/12/24 16:45:16 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    
    19/12/24 16:45:16 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    
    19/12/24 16:45:16 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@720cd60d
    
    19/12/24 16:45:16 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=175793760, maxSingleShuffleLimit=43948440, mergeThreshold=116023888, ioSortFactor=10, memToMemMergeOutputsThreshold=10
    
    19/12/24 16:45:16 INFO reduce.EventFetcher: attempt_local704420616_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
    
    19/12/24 16:45:16 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local704420616_0001_m_000000_0 decomp: 447 len: 451 to MEMORY
    
    19/12/24 16:45:16 INFO reduce.InMemoryMapOutput: Read 447 bytes from map-output for attempt_local704420616_0001_m_000000_0
    
    19/12/24 16:45:16 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 447, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->447
    
    19/12/24 16:45:16 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: 1 / 1 copied.
    
    19/12/24 16:45:16 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
    
    19/12/24 16:45:16 INFO mapred.Merger: Merging 1 sorted segments
    
    19/12/24 16:45:16 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 441 bytes
    
    19/12/24 16:45:16 INFO reduce.MergeManagerImpl: Merged 1 segments, 447 bytes to disk to satisfy reduce memory limit
    
    19/12/24 16:45:16 INFO reduce.MergeManagerImpl: Merging 1 files, 451 bytes from disk
    
    19/12/24 16:45:16 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
    
    19/12/24 16:45:16 INFO mapred.Merger: Merging 1 sorted segments
    
    19/12/24 16:45:16 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 441 bytes
    
    19/12/24 16:45:16 INFO mapred.LocalJobRunner: 1 / 1 copied.
    
    19/12/24 16:45:17 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    
    19/12/24 16:45:17 INFO mapred.Task: Task:attempt_local704420616_0001_r_000000_0 is done. And is in the process of committing
    
    19/12/24 16:45:17 INFO mapred.LocalJobRunner: 1 / 1 copied.
    
    19/12/24 16:45:17 INFO mapred.Task: Task attempt_local704420616_0001_r_000000_0 is allowed to commit now
    
    19/12/24 16:45:17 INFO output.FileOutputCommitter: Saved output of task 'attempt_local704420616_0001_r_000000_0' to hdfs://node1:8020/output1/_temporary/0/task_local704420616_0001_r_000000
    
    19/12/24 16:45:17 INFO mapred.LocalJobRunner: reduce > reduce
    
    19/12/24 16:45:17 INFO mapred.Task: Task 'attempt_local704420616_0001_r_000000_0' done.
    
    19/12/24 16:45:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local704420616_0001_r_000000_0
    
    19/12/24 16:45:17 INFO mapred.LocalJobRunner: reduce task executor complete.
    
    19/12/24 16:45:17 INFO mapreduce.Job: Job job_local704420616_0001 running in uber mode : false
    
    19/12/24 16:45:17 INFO mapreduce.Job:  map 100% reduce 100%
    
    19/12/24 16:45:17 INFO mapreduce.Job: Job job_local704420616_0001 completed successfully
    
    19/12/24 16:45:17 INFO mapreduce.Job: Counters: 35
    
             File System Counters
    
                       FILE: Number of bytes read=553634
    
                       FILE: Number of bytes written=1137145
    
                       FILE: Number of read operations=0
    
                       FILE: Number of large read operations=0
    
                       FILE: Number of write operations=0
    
                       HDFS: Number of bytes read=692
    
                       HDFS: Number of bytes written=313
    
                       HDFS: Number of read operations=13
    
                       HDFS: Number of large read operations=0
    
                       HDFS: Number of write operations=4
    
             Map-Reduce Framework
    
                       Map input records=11
    
                       Map output records=47
    
                       Map output bytes=524
    
                       Map output materialized bytes=451
    
                       Input split bytes=95
    
                       Combine input records=47
    
                       Combine output records=33
    
                       Reduce input groups=33
    
                       Reduce shuffle bytes=451
    
                       Reduce input records=33
    
                       Reduce output records=33
    
                       Spilled Records=66
    
                       Shuffled Maps =1
    
                       Failed Shuffles=0
    
                       Merged Map outputs=1
    
                       GC time elapsed (ms)=16
    
                       Total committed heap usage (bytes)=504365056
    
             Shuffle Errors
    
                       BAD_ID=0
    
                       CONNECTION=0
    
                       IO_ERROR=0
    
                       WRONG_LENGTH=0
    
                       WRONG_MAP=0
    
                       WRONG_REDUCE=0
    
             File Input Format Counters
    
                       Bytes Read=346
    
             File Output Format Counters
    
                       Bytes Written=313
    
     
    

      

    查看output1目录下的统计:

    [hdfs@node1 examples]$ hadoop fs -cat /output1/*

    "+3   1

    "3     1

    "Password        2

    "never      1

    "never"    1

    "unexpired"     1

    #!/bin/bash      1

    #edate=$(chage       1

    $2,$3,$6}')        1

    $4,$5,$6,$7}')  2

    $USER|grep     1

    $date3     1

    $edate     2

    '{print       3

    -d      1

    -l       2

    =       2

    [[       2

    ]];     2

    date3=$(date  1

    day"|awk          1

    days"        1

    echo          3

    edate=$(chage         1

    elif    1

    else  1

    expired"   1

    expires"   2

    fi       1

    if       1

    test|grep          1

    then 2

    |awk         2

     

    完成ok~

  • 相关阅读:
    mysql-8.0.16-winx64/Linux修改root用户密码
    MYSQL学习笔记/2019
    博客论坛系统数据库之表的设计
    MySql-8.0.16版本部分安装问题修正
    将博客搬至CSDN
    解决远程连不到CentOS7虚拟机或ifconfig中没有ens33
    Windows本地运行调试Spark或Hadoop程序失败:ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
    CentOS7安装Git-2.22.1
    CentOS7安装SVN1.9.12
    Storm本地启动拓扑报错:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/storm/topology/IRichSpout
  • 原文地址:https://www.cnblogs.com/zhxiaoxiao/p/12092535.html
Copyright © 2011-2022 走看看