zoukankan      html  css  js  c++  java
  • Hadoop的伪分布式模式

    1 启动HDFS并运行MapReduce程序

    a)配置:hadoop-env.sh

    Linux系统中获取JDK的安装路径:

    [root@hadoop001 hadoop-2.7.2]# echo $JAVA_HOME
    /opt/module/jdk1.8.0_144
    [root@hadoop001 hadoop]# vim hadoop-env.sh

    修改JAVA_HOME 路径:

    (b)配置:core-site.xml

    [root@hadoop001 hadoop]# cat core-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <!-- 指定HDFS中NameNode的地址 -->
    <property>
    <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    
    <!-- 指定Hadoop运行时产生文件的存储目录 -->
    <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/module/hadoop-2.7.2/data/tmp</value>
    </property>
    
    
    </configuration>
    [root@hadoop001 hadoop]#

    (c)配置:hdfs-site.xml

    [root@hadoop001 hadoop]# cat hdfs-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
            <name>dfs.replication</name>
            <value>1</value>
    </property>
    </configuration>
    [root@hadoop001 hadoop]#

    (2)启动集群

              (a)格式化NameNode(第一次启动时格式化,以后就不要总格式化)

    [root@hadoop001 hadoop-2.7.2]# bin/hdfs namenode -format
    20/01/14 16:29:01 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = iZbp1efx14jd8471u20gpaZ/172.16.233.215
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 2.7.2
    STARTUP_MSG:   classpath = /opt/module/hadoop-2.7.2/etc/hadoop:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/curator-client-2.7.1.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/zookeeper-3.4.6.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/commons-compress-1.4.1.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/commons-math3-3.1.1.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/netty-3.6.2.Final.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/commons-httpclient-3.1.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/guava-11.0.2.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/stax-api-1.0-2.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/hadoop-auth-2.7.2.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/jersey-json-1.9.jar:/opt/mo

                  (b)启动NameNode

    [root@hadoop001 hadoop-2.7.2]# sbin/hadoop-daemon.sh start namenode
    starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-namenode-hadoop001.out

                  (c)启动DataNode

    [root@hadoop001 hadoop-2.7.2]# sbin/hadoop-daemon.sh start datanode
    starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop001.out

    (3)查看集群

                  (a)查看是否启动成功

    [root@hadoop001 hadoop-2.7.2]# jps
    23608 NameNode
    23769 Jps
    23691 DataNode
    [root@hadoop001 hadoo

    注意:jps是JDK中的命令,不是Linux命令。不安装JDK不能使用jps

                 (b)web端查看HDFS文件系统

    (c)查看产生的Log日志

                    说明:在企业中遇到Bug时,经常根据日志提示信息去分析问题、解决Bug。

    当前目录:/opt/module/hadoop-2.7.2/logs

    [root@hadoop001 hadoop-2.7.2]# cd /opt/module/hadoop-2.7.2/logs
    [root@hadoop001 logs]# ll
    total 60
    -rw-r--r-- 1 root root 23889 Jan 14 16:29 hadoop-root-datanode-hadoop001.log
    -rw-r--r-- 1 root root   717 Jan 14 16:29 hadoop-root-datanode-hadoop001.out
    -rw-r--r-- 1 root root 27523 Jan 14 16:29 hadoop-root-namenode-hadoop001.log
    -rw-r--r-- 1 root root   717 Jan 14 16:29 hadoop-root-namenode-hadoop001.out
    -rw-r--r-- 1 root root     0 Jan 14 16:29 SecurityAuth-root.audit

    (d)思考:为什么不能一直格式化NameNode,格式化NameNode,要注意什么?

    [root@hadoop001 hadoop-2.7.2]# cd data/tmp/dfs/name/current/
    [root@hadoop001 current]# ll
    total 1040
    -rw-r--r-- 1 root root 1048576 Jan 14 16:29 edits_inprogress_0000000000000000001
    -rw-r--r-- 1 root root     351 Jan 14 16:29 fsimage_0000000000000000000
    -rw-r--r-- 1 root root      62 Jan 14 16:29 fsimage_0000000000000000000.md5
    -rw-r--r-- 1 root root       2 Jan 14 16:29 seen_txid
    -rw-r--r-- 1 root root     206 Jan 14 16:29 VERSION
    [root@hadoop001 current]# cat VERSION
    #Tue Jan 14 16:29:02 CST 2020
    namespaceID=1486374433
    clusterID=CID-4fa9a123-10f6-4bdd-95a6-c6faaa926c85
    cTime=0
    storageType=NAME_NODE
    blockpoolID=BP-770615407-172.16.233.215-1578990542229
    layoutVersion=-63
    [root@hadoop001 current]# cd ../../data/current/
    [root@hadoop001 current]# ll
    total 8
    drwx------ 4 root root 4096 Jan 14 16:29 BP-770615407-172.16.233.215-1578990542229
    -rw-r--r-- 1 root root  229 Jan 14 16:29 VERSION
    [root@hadoop001 current]# cat VERSION
    #Tue Jan 14 16:29:47 CST 2020
    storageID=DS-b07b5055-82c9-4010-a583-dd75fe1e1097
    clusterID=CID-4fa9a123-10f6-4bdd-95a6-c6faaa926c85
    cTime=0
    datanodeUuid=9002013a-cc17-4ae7-886f-3f0ef4d44595
    storageType=DATA_NODE
    layoutVersion=-56

    注意:格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到已往数据。所以,格式NameNode时,一定要先删除data数据和log日志,然后再格式化NameNode。

    (4)操作集群

                  (a)在HDFS文件系统上创建一个input文件夹

    [root@hadoop001 hadoop-2.7.2]# bin/hdfs dfs -mkdir -p /user/topcheer/input

                  (b)将测试文件内容上传到文件系统上

    [root@hadoop001 hadoop-2.7.2]# bin/hdfs dfs -put wcinput/wc.input    /user/topcheer/input/

     

                  (c)查看上传的文件是否正确

    [root@hadoop001 hadoop-2.7.2]# bin/hdfs dfs -lsr /
    lsr: DEPRECATED: Please use 'ls -R' instead.
    drwxr-xr-x   - root supergroup          0 2020-01-14 16:45 /user
    drwxr-xr-x   - root supergroup          0 2020-01-14 16:45 /user/topcheer
    drwxr-xr-x   - root supergroup          0 2020-01-14 16:53 /user/topcheer/input
    -rw-r--r--   1 root supergroup         48 2020-01-14 16:53 /user/topcheer/input/wc.input

                  (d)运行MapReduce程序

    [root@hadoop001 hadoop-2.7.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount  /user/topcheer/input/ /user/topcheer/output/
    20/01/14 17:08:35 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    20/01/14 17:08:35 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    20/01/14 17:08:35 INFO input.FileInputFormat: Total input paths to process : 1
    20/01/14 17:08:35 INFO mapreduce.JobSubmitter: number of splits:1
    20/01/14 17:08:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local330020084_0001
    20/01/14 17:08:36 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    20/01/14 17:08:36 INFO mapreduce.Job: Running job: job_local330020084_0001
    20/01/14 17:08:36 INFO mapred.LocalJobRunner: OutputCommitter set in config null

                  (e)查看输出结果

    命令行查看:

    [root@hadoop001 hadoop-2.7.2]# bin/hdfs dfs -cat /user/topcheer/output/*
    hadoop  2
    mapreduce       1
    topcheer        2
    yarn    1
    [root@hadoop001 hadoop-2.7.2]#

    浏览器查看,如图所示

    (f)将测试文件内容下载到本地

    [root@hadoop101 hadoop-2.7.2]$ hdfs dfs -get /user/topcheer/output/part-r-00000 ./wcoutput/ 

     2 启动YARN并运行MapReduce程序

    (1)配置集群

     (a)配置yarn-env.sh

             配置一下JAVA_HOME

            

    (b)配置yarn-site.xml

    <configuration>
    
    <!-- Reducer获取数据的方式 -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    
    <!-- 指定YARN的ResourceManager的地址 -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop001</value>
    </property>
    
    <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>
    
    <!-- 日志保留时间设置7天 -->
    <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
    </property>
    
    </configuration>

    (c)配置:mapred-env.sh

    配置一下JAVA_HOME

    [root@hadoop001 hadoop]# cat mapred-env.sh
    # Licensed to the Apache Software Foundation (ASF) under one or more
    # contributor license agreements.  See the NOTICE file distributed with
    # this work for additional information regarding copyright ownership.
    # The ASF licenses this file to You under the Apache License, Version 2.0
    # (the "License"); you may not use this file except in compliance with
    # the License.  You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    # export JAVA_HOME=/home/y/libexec/jdk1.6.0/
    export JAVA_HOME=/opt/module/jdk1.8.0_144
    
    export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
    
    export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
    
    #export HADOOP_JOB_HISTORYSERVER_OPTS=
    #export HADOOP_MAPRED_LOG_DIR="" # Where log files are stored.  $HADOOP_MAPRED_HOME/logs by default.
    #export HADOOP_JHS_LOGGER=INFO,RFA # Hadoop JobSummary logger.
    #export HADOOP_MAPRED_PID_DIR= # The pid files are stored. /tmp by default.
    #export HADOOP_MAPRED_IDENT_STRING= #A string representing this instance of hadoop. $USER by default
    #export HADOOP_MAPRED_NICENESS= #The scheduling priority for daemons. Defaults to 0.
    [root@hadoop001 hadoop]#

    (d)配置: (对mapred-site.xml.template重新命名为) mapred-site.xml

    [root@hadoop001 hadoop]# mv mapred-site.xml.template mapred-site.xml
    [root@hadoop001 hadoop]# vim mapred-site.xml 
    [root@hadoop001 hadoop]# cat mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <!-- 指定MR运行在YARN上 -->
          <property>
                    <name>mapreduce.framework.name</name>
                    <value>yarn</value>
           </property>
    
    <!-- 历史服务器端地址 -->
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop001:10020</value>
    </property>
    <!-- 历史服务器web端地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop001:19888</value>
    </property>
    
    </configuration>
    [root@hadoop001 hadoop]#

    (2)启动集群

    (a)启动前必须保证NameNode和DataNode已经启动

    (b)启动ResourceManager

    [root@hadoop001 hadoop-2.7.2]# sbin/yarn-daemon.sh start resourcemanager
    starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-resourcemanager-hadoop001.out

    (c)启动NodeManager

    [root@hadoop001 hadoop-2.7.2]# sbin/yarn-daemon.sh start nodemanager
    starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-hadoop001.out

    (3)集群操作

    (a)YARN的浏览器页面查看,如图2-35所示

                  (b)删除文件系统上的output文件

    [root@hadoop001 hadoop-2.7.2]# bin/hdfs dfs -rm -R /user/topcheer/output
    20/01/14 20:23:56 INFO fs.TrashPolicyDefault: Namenode trash configuration: Dele                                                                                     tion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted /user/topcheer/output

                  (c)执行MapReduce程序

    [root@hadoop001 hadoop-2.7.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapr                                                                                     educe-examples-2.7.2.jar wordcount /user/topcheer/input  /user/topcheer/output
    20/01/14 20:25:53 INFO client.RMProxy: Connecting to ResourceManager at hadoop00                                                                                     1/172.16.233.215:8032
    20/01/14 20:25:54 INFO input.FileInputFormat: Total input paths to process : 1
    20/01/14 20:25:54 INFO mapreduce.JobSubmitter: number of splits:1
    20/01/14 20:25:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_15                                                                                     79004405060_0001
    20/01/14 20:25:54 INFO impl.YarnClientImpl: Submitted application application_15

                  (d)查看运行结果,如图所示

    3 配置历史服务器

    为了查看程序的历史运行情况,需要配置一下历史服务器。具体配置步骤如下:

    1.    配置mapred-site.xml

    [root@hadoop001 hadoop]# vim mapred-site.xml
    [root@hadoop001 hadoop]#

    在该文件里面增加如下配置。

    2.    启动历史服务器

    [root@hadoop001 hadoop-2.7.2]# sbin/mr-jobhistory-daemon.sh start historyserver
    starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-root-historyserver-hadoop001.out
    [root@hadoop001 hadoop-2.7.2]# jps
    24836 ResourceManager
    25749 JobHistoryServer
    25093 NodeManager
    23608 NameNode
    25787 Jps
    23691 DataNode

    3.    查看JobHistory

    http://hadoop001:19888/jobhistory

    44 配置日志的聚集

    日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。

    日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。

    注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryManager。

    开启日志聚集功能具体步骤如下:

    1. 配置yarn-site.xml
    [root@hadoop001 hadoop]# vi yarn-site.xml
    [root@hadoop001 hadoop]# cd ../../

    在该文件里面增加如下配置。

    1. 关闭NodeManager 、ResourceManager和HistoryManager
    root@hadoop001 hadoop-2.7.2]# sbin/yarn-daemon.sh stop resourcemanager
    stopping resourcemanager
    [root@hadoop001 hadoop-2.7.2]# jps
    25905 Jps
    25749 JobHistoryServer
    25093 NodeManager
    23608 NameNode
    23691 DataNode
    [root@hadoop001 hadoop-2.7.2]# sbin/yarn-daemon.sh stop nodemanager
    stopping nodemanager
    nodemanager did not stop gracefully after 5 seconds: killing with kill -9
    [root@hadoop001 hadoop-2.7.2]# sbin/mr-jobhistory-daemon.sh stop historyserver
    stopping historyserver

    启动NodeManager 、ResourceManager和HistoryManager

    [root@hadoop001 hadoop-2.7.2]# sbin/yarn-daemon.sh start resourcemanager
    starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-resourcemanager-hadoop001.out
    [root@hadoop001 hadoop-2.7.2]# sbin/yarn-daemon.sh start nodemanager
    starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-hadoop001.out
    [root@hadoop001 hadoop-2.7.2]# sbin/mr-jobhistory-daemon.sh start historyserver
    starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-root-historyserver-hadoop001.out
    [root@hadoop001 hadoop-2.7.2]# jps
    26390 JobHistoryServer
    25991 ResourceManager
    23608 NameNode
    23691 DataNode
    26428 Jps
    26237 NodeManager

    删除HDFS上已经存在的输出文件

    [root@hadoop001 hadoop-2.7.2]# bin/hdfs dfs -rm -R /user/topcheer/output
    20/01/14 20:54:21 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted /user/topcheer/output

    执行WordCount程序

    [root@hadoop001 hadoop-2.7.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/topcheer/input  /user/topcheer/output
    20/01/14 20:54:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop001/172.16.233.215:8032
    20/01/14 20:54:43 INFO input.FileInputFormat: Total input paths to process : 1
    20/01/14 20:54:43 INFO mapreduce.JobSubmitter: number of splits:1
    20/01/14 20:54:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1579006407905_0001
    20/01/14 20:54:44 INFO impl.YarnClientImpl: Submitted application application_1579006407905_0001
    1. 查看日志,如图所示

     

     

    5 配置文件说明

    Hadoop配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值。

    (1)默认配置文件:

    表2-1

    要获取的默认文件

    文件存放在Hadoop的jar包中的位置

    [core-default.xml]

    hadoop-common-2.7.2.jar/ core-default.xml

    [hdfs-default.xml]

    hadoop-hdfs-2.7.2.jar/ hdfs-default.xml

    [yarn-default.xml]

    hadoop-yarn-common-2.7.2.jar/ yarn-default.xml

    [mapred-default.xml]

    hadoop-mapreduce-client-core-2.7.2.jar/ mapred-default.xml

           (2)自定义配置文件:

           core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上,用户可以根据项目需求重新进行修改配置。

  • 相关阅读:
    基本控件文档-UIView属性---iOS-Apple苹果官方文档翻译
    基本控件文档-UITextField属性---iOS-Apple苹果官方文档翻译
    vue后台管理权限篇
    JavaScript call、apply、bind的用法
    Array map()方法
    markdown常用语法总结
    webpack配置说明
    Object.prototype.toString.call(value)
    前后端数据交互和前端数据展示
    vue常用的传值方式
  • 原文地址:https://www.cnblogs.com/dalianpai/p/12193455.html
Copyright © 2011-2022 走看看