zoukankan      html  css  js  c++  java
  • Hadoop2学习记录(1) |HA完全分布式集群搭建

     

    准备

    系统:CentOS 6或者RedHat 6(这里用的是64位操作)

    软件:JDK 1.7、hadoop-2.3.0、native64位包(可以再csdn上下载,这里不提供了)

    部署规划

    192.168.1.11 C6H1 NameNode、DataNode、ResourceManager、NodeManager、JournalNode

    192.168.1.12 C6H2 NameNode、DataNode、JournalNode、NodeManager

    192.168.1.13 C6H3 DataNode、JourNode、NodeManager

    配置过程
    1、关闭相关服务、配置HOSTS文件、解压缩包

    chkconfig iptables off

    service iptables stop #关闭防火墙

    vi /etc/selinux/config

    SELINUX=disabled #注销以前的,添加这个或者直接改。

    :wq

    setenforce 0 #强制关闭selinux

    #设置hosts,每台都设置

    vi /etc/hosts

    192.168.1.11 C6H1

    192.168.1.12 C6H2

    192.168.1.13 C6H3

    tar –zxvf hadoop-2.3.0.tar.gz –C /usr/local

    目录改名为hadoop2

    2、安装JDK

    tar –zxvf jdk-1.7.xx.tar.gz –C /usr/src

    cd /usr/src

    mv /usr/src/jdk-1.7.xx /usr/local/jdk

    vi /etc/profile

    添加环境变量,这里一次性把所有的环境变量都添加了

    export JAVA_HOME=/usr/local/jdk

    export ZOOKEEPER_HOME=/usr/local/zk

    export HADOOP_HOME=/usr/local/hadoop2

    export PATH=.:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

    :wq #保存退出

    source /etc/profile #立即生效

    验证

    java –version

    java version "1.7.0_51"

    Java(TM) SE Runtime Environment (build 1.7.0_51-b13)

    Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

    3、配置密钥登陆

    ssh-keygen –t rsa #生成密钥,一路回车。每台机器上都执行一遍

    scp /root/.ssh/id_rsa.pub root@C6H1:/root/C6H2_key #分别将C6H2C6H3上的公钥传到C6H1中。

    在C6H1上操作:

    cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys

    cat /root/C6H2_key >> /root/.ssh/authorized_keys #>>代表追加,一个>覆盖了内容

    cat /root/C6H3_key >> /root/.ssh/authorized_keys

    将C6H1中的文件拷贝到C6H2C6H3机器的/root/.ssh/目录下,这样机器之间免密码可以登陆了。

    4、配置core-site.xml

    <configuration>

    <!—设置集群的名称 -- >

    <property>

    <name>fs.defaultFS</name>

    <value>hdfs://cluster1</value>

    </property>

    <! – 设置目录存储的位置,默认namenode、datanode都存储在这里目录下 -- >

    <property>

    <name>hadoop.tmp.dir</name>

    <value>/data/dfs/hadoop</value>

    </property>

    <property>

    </configuration>

    5、配置hdfs-site.xml

    <configuration>

    <! – 副本数,默认3个 -- >

    <property>

    <name>dfs.replication</name>

    <value>2</value>

    </property>

    <! – 设置集群名称 -- >

    <property>

    <name>dfs.nameservices</name>

    <value>cluster1</value>

    </property>

    <! – 设置集群中的NameNode节点-- >

    <property>

    <name>dfs.ha.namenodes.cluster1</name>

    <value>C6H1,C6H2</value>

    </property>

    <! –- 设置集群中的C6H1的namenode的rpc访问地址和端口 -- >

    <property>

    <name>dfs.namenode.rpc-address.cluster1.C6H1</name>

    <value>C6H1:9000</value>

    </property>

    <! –- 设置集群中的C6H2的namenode的rpc访问地址和端口 -- >

    <property>

    <name>dfs.namenode.rpc-address.cluster1.C6H2</name>

    <value>C6H2:9000</value>

    </property>

    <! –- 设置集群中的C6H1的namenode的http访问地址和端口 -- >

    <property>

    <name>dfs.namenode.http-address.cluster1.C6H1</name>

    <value>C6H1:50070</value>

    </property>

    <! –- 设置集群中的C6H2的namenode的rpc访问地址和端口 -- >

    <property>

    <name>dfs.namenode.http-address.cluster1.C6H2</name>

    <value>C6H2:50070</value>

    </property>

    <! –- 设置namenode的元数据信息都保存在journal集群中 -- >

    <property>

    <name>dfs.namenode.shared.edits.dir</name>

    <value>qjournal://C6H1:8485;C6H2:8485;C6H3:8485/cluster1</value>

    </property>

    <!-- 设置cluster1故障时,哪一个实现类指定故障切换 -- >

    <property>

    <name>dfs.client.failover.proxy.provider.cluster1</name>

    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

    </property>

    <! -- 设置NameNode切换的操作方式,使用ssh操作 -- >

    <property>

    <name>dfs.ha.fencing.methods</name>

    <value>sshfence</value>

    </property>

    <!-- 设置密钥保存位置 -- >

    <property>

    <name>dfs.ha.fencing.ssh.private-key-file</name>

    <value>/root/.ssh/id_rsa</value>

    </property>

    <! -- 指定journalNode集群对NameNode的目录进行共享时,自己存储在磁盘的路径-- >

    <property>

    <name>dfs.journalnode.edits.dir</name>

    <value>/data/dfs/journal</value>

    </property>

    <! -- 设置namenode存储在磁盘的路径 -->

    <property>

    <name>dfs.namenode.name.dir</name>

    <value>/data/dfs/name</value>

    </property>

    <! -- 设置datanode存储在磁盘的路径 -- >

    <property>

    <name>dfs.datanode.data.dir</name>

    <value>/data/dfs/data</value>

    </property>

    <! -- 开启web端访问FS -- >

    <property>

    <name>dfs.webhdfs.enabled</name>

    <value>true</value>

    </property>

    </configuration>

    6、配置mapred-site.xml

    <configuration>

    <! -- 与Hadoop1不一样的这里设置yarn方式执行mapreduce -- >

    <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

    </property>

    </configuration>

    7、配置yarn-site.xml

    <configuration>

    <!-- Site specific YARN configuration properties -->

    <! -- 设置reourcemanager主机,这里只能设置一个,有单点隐患! -- >

    <property>

    <name>yarn.resourcemanager.hostname</name>

    <value>C6H1</value>

    </property>

    <! -- 设置aux-services,mapreduce_shuffle -- >

    <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

    </property>

    </configuration>

    8、配置yarn-env.sh

    export JAVA_HOME=/usr/local/jdk #设置hadoop调用的JAVA_HOME路径

    9、配置mapred-env.sh

    export JAVA_HOME=/usr/local/jdk #设置hadoop调用的JAVA_HOME路径

    10、配置hadoop-env.sh

    export JAVA_HOME=/usr/local/jdk #设置hadoop调用的JAVA_HOME路径

    11、配置slaves

    vi /usr/local/hadoop2/etc/hadoop/slaves

    C6H1

    C6H2

    C6H3

    每行一个主机名

    12、第一次初始化启动过程

    初始化跟hadoop1不同,按照步骤来操作,如果重复格式化需要删除 /data/dfs/中的所有目录,也就是hadoop.tmp.dir设置的路径。

    1、分别在三台机器上启动JournalNode

    hadoop-daemon.sh start journalnode

    2、在C6H1格式化NameNode

    hdfs namenode –format

    3、在C6H1上启动namenode

    hadoop-daemon.sh start namenode

    4、在C6H2上格式化另一个NameNode,需要同步C6H1上的NameNode数据。

    hdfs namenode –bootstrapStandby

    5、启动另一个NameNode

    hadoop-daemon.sh start namenode

    6、关闭NameNode,启动所有的hadoop所有服务

    stop-all.sh

    start-all.sh #以后启动直接使用这个命令就行,第一次初始化必须按照以上步骤操作。

    启动HDFS 的HA自动切换

    hdfs haadmin –failover –forceactive CH61 C6H2

    Failover from C6H1 to C6H2 successful

    13、测试HDFS

    clip_image002

    clip_image004

    SHELL测试创建文件夹

    hadoop fs –mkdir /data

    hadoop fs –ls /

    14、测试MapReduce

    vi /root/word.text

    hello you

    hello me

    上传一个文本文件

    hadoop fs –put /root/word.text /

    使用自带的测试包测试wordcount

    格式 hadoop jar jar包路径 wordcount hdfs输入路径 输出路径(必须不存在的,会自动创建)

    [root@C6H1 hadoop]# hadoop jar /usr/local/hadoop2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount /word.text /word_out1

    14/03/16 09:36:21 INFO client.RMProxy: Connecting to ResourceManager at C6H1/192.168.1.11:8032

    14/03/16 09:36:22 INFO input.FileInputFormat: Total input paths to process : 1

    14/03/16 09:36:22 INFO mapreduce.JobSubmitter: number of splits:1

    14/03/16 09:36:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1394933446304_0001

    14/03/16 09:36:23 INFO impl.YarnClientImpl: Submitted application application_1394933446304_0001

    14/03/16 09:36:23 INFO mapreduce.Job: The url to track the job: http://C6H1:8088/proxy/application_1394933446304_0001/

    14/03/16 09:36:23 INFO mapreduce.Job: Running job: job_1394933446304_0001

    14/03/16 09:36:31 INFO mapreduce.Job: Job job_1394933446304_0001 running in uber mode : false

    14/03/16 09:36:31 INFO mapreduce.Job: map 0% reduce 0%

    14/03/16 09:36:38 INFO mapreduce.Job: map 100% reduce 0%

    14/03/16 09:36:44 INFO mapreduce.Job: map 100% reduce 100%

    14/03/16 09:36:45 INFO mapreduce.Job: Job job_1394933446304_0001 completed successfully

    14/03/16 09:36:45 INFO mapreduce.Job: Counters: 49

    File System Counters

    FILE: Number of bytes read=48

    FILE: Number of bytes written=173817

    FILE: Number of read operations=0

    FILE: Number of large read operations=0

    FILE: Number of write operations=0

    HDFS: Number of bytes read=108

    HDFS: Number of bytes written=26

    HDFS: Number of read operations=6

    HDFS: Number of large read operations=0

    HDFS: Number of write operations=2

    Job Counters

    Launched map tasks=1

    Launched reduce tasks=1

    Data-local map tasks=1

    Total time spent by all maps in occupied slots (ms)=4262

    Total time spent by all reduces in occupied slots (ms)=3556

    Total time spent by all map tasks (ms)=4262

    Total time spent by all reduce tasks (ms)=3556

    Total vcore-seconds taken by all map tasks=4262

    Total vcore-seconds taken by all reduce tasks=3556

    Total megabyte-seconds taken by all map tasks=4364288

    Total megabyte-seconds taken by all reduce tasks=3641344

    Map-Reduce Framework

    Map input records=2

    Map output records=4

    Map output bytes=34

    Map output materialized bytes=48

    Input split bytes=90

    Combine input records=4

    Combine output records=4

    Reduce input groups=4

    Reduce shuffle bytes=48

    Reduce input records=4

    Reduce output records=4

    Spilled Records=8

    Shuffled Maps =1

    Failed Shuffles=0

    Merged Map outputs=1

    GC time elapsed (ms)=152

    CPU time spent (ms)=1330

    Physical memory (bytes) snapshot=308592640

    Virtual memory (bytes) snapshot=1708167168

    Total committed heap usage (bytes)=136450048

    Shuffle Errors

    BAD_ID=0

    CONNECTION=0

    IO_ERROR=0

    WRONG_LENGTH=0

    WRONG_MAP=0

    WRONG_REDUCE=0

    File Input Format Counters

    Bytes Read=18

    File Output Format Counters

    Bytes Written=26

     

    集群搭建参考吴超-沉思录博客,转载请注明出处,谢谢!

  • 相关阅读:
    [题解]小B的询问-莫队水题
    [学习笔记]莫队学习笔记[未完待续]
    ffmpeg设置超时时间
    python signal
    pydantic库使用文档
    rtmp及直播流相关资料
    ffmpeg 将视频转换成m3u8视频
    nginx stop失败问题
    linux使用ssh远程登录服务器
    解决Fcitx输入法文字候选无前端问题
  • 原文地址:https://www.cnblogs.com/luguoyuanf/p/3602909.html
Copyright © 2011-2022 走看看