zoukankan      html  css  js  c++  java
  • linux运维、架构之路-Hadoop完全分布式集群搭建

    一、介绍

    Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。
    HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。
    Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。

    二、部署环境规划

    1、服务器地址规划

    序号

    IP地址

    机器名

    类型

    用户名

    1

    10.0.0.67

    Master.Hadoop

    Namenode

    Hadoop/root

    2

    10.0.0.68

    Slave1.Hadoop

    Datanode

    Hadoop/root

    3

    10.0.0.69

    Slave2.Hadoop

    Datanode

    Hadoop/root

    2、部署环境

    [root@Master ~]# cat /etc/redhat-release 
    CentOS release 6.9 (Final)
    [root@Master ~]# uname -r
    2.6.32-696.el6.x86_64
    [root@Master ~]# /etc/init.d/iptables status
    iptables: Firewall is not running.
    [root@Master ~]# getenforce 
    Disabled

    3、统一/etc/hosts解析

    10.0.0.67  Master.Hadoop 
    10.0.0.68  Slave1.Hadoop 
    10.0.0.69  Slave2.Hadoop

    三、SSH无密码验证配置

    1、Master操作

    [root@Master ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    Generating public/private dsa key pair.
    Created directory '/root/.ssh'.
    Your identification has been saved in /root/.ssh/id_dsa.
    Your public key has been saved in /root/.ssh/id_dsa.pub.
    The key fingerprint is:
    d9:50:b7:b1:f9:aa:83:6e:34:b9:0a:10:61:b9:83:e8 root@Master.Hadoop
    The key's randomart image is:
    +--[ DSA 1024]----+
    |  o.      . o    |
    | ...     . . =   |
    |....    .   +    |
    |o o.     +   .   |
    |. ..    S..   .  |
    | E .    +    .   |
    |    .  . +  .    |
    |     .  + ..     |
    |      .+. ..     |
    +-----------------+
    [root@Master ~]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

    2、分发公钥到两个Slave上面

    ①Slave1

    [root@Slave1 ~]# scp root@Master.Hadoop:~/.ssh/id_dsa.pub ~/.ssh/master_dsa.pub
    The authenticity of host 'master.hadoop (10.0.0.67)' can't be established.
    RSA key fingerprint is b4:24:ea:5f:aa:06:3b:7c:76:93:b9:11:4c:65:70:95.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'master.hadoop,10.0.0.67' (RSA) to the list of known hosts.
    root@master.hadoop's password: 
    id_dsa.pub                                                           100%  608     0.6KB/s   00:00    
    [root@Slave1 ~]# cat ~/.ssh/master_dsa.pub >> ~/.ssh/authorized_keys

    ②Slave2

    [root@Slave2 ~]# scp root@Master.Hadoop:~/.ssh/id_dsa.pub ~/.ssh/master_dsa.pub
    The authenticity of host 'master.hadoop (10.0.0.67)' can't be established.
    RSA key fingerprint is b4:24:ea:5f:aa:06:3b:7c:76:93:b9:11:4c:65:70:95.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'master.hadoop,10.0.0.67' (RSA) to the list of known hosts.
    root@master.hadoop's password: 
    id_dsa.pub                                                           100%  608     0.6KB/s   00:00    
    [root@Slave2 ~]# cat ~/.ssh/master_dsa.pub >> ~/.ssh/authorized_keys

    ③Master测试连接slave

    [root@Master ~]# ssh Slave1.Hadoop
    Last login: Tue Aug  7 10:30:53 2018 from 10.0.0.67
    [root@Slave1 ~]# exit
    logout
    Connection to Slave1.Hadoop closed.
    [root@Master ~]# ssh Slave2.Hadoop
    Last login: Tue Aug  7 10:31:04 2018 from 10.0.0.67

    四、Hadoop安装及环境配置

    1、Master操作

    ①安装JAVA环境

    tar xf jdk-8u121-linux-x64.tar.gz -C /usr/local/
    ln -s /usr/local/jdk1.8.0_121/ /usr/local/jdk

    配置环境变量

    [root@Master ~]# tail -4  /etc/profile
    export JAVA_HOME=/usr/local/jdk1.8.0_181  
    export JRE_HOME=/usr/local/jdk1.8.0_181/jre  
    export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH  
    export PATH=$JAVA_HOME/bin:$PATH
    [root@Master ~]# source /etc/profile
    [root@Master ~]# java -version
    java version "1.8.0_181"
    Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

    2、Hadoop安装及其环境配置

    ①安装

    tar -xf hadoop-2.8.0.tar.gz -C /usr/
    mv /usr/hadoop-2.8.0/ /usr/hadoop
    ###配置Hadoop环境变量###
    export HADOOP_HOME=/usr/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin

    ②配置hadoop-env.sh生效

    vim /usr/hadoop/etc/hadoop/hadoop-env.sh
    export JAVA_HOME=/usr/local/jdk1.8.0_181
    source /usr/hadoop/etc/hadoop/hadoop-env.sh
    [root@Master usr]# hadoop version #查看Hadoop版本
    Hadoop 2.8.0
    Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 91f2b7a13d1e97be65db92ddabc627cc29ac0009
    Compiled by jdu on 2017-03-17T04:12Z
    Compiled with protoc 2.5.0
    From source with checksum 60125541c2b3e266cbf3becc5bda666
    This command was run using /usr/hadoop/share/hadoop/common/hadoop-common-2.8.0.jar

    ③创建Hadoop所需的子目录

    mkdir /usr/hadoop/{tmp,hdfs}
    mkdir /usr/hadoop/hdfs/{name,tmp,data} -p

    ④修改Hadoop核心配置文件core-site.xml,配置是HDFS master(即namenode)的地址和端口号

    vim /usr/hadoop/etc/hadoop/core-site.xml
    
    <configuration>
    <property>
            <name>hadoop.tmp.dir</name>
            <value>/usr/hadoop/tmp</value>
            <final>true</final>
    <!--(备注:请先在 /usr/hadoop 目录下建立 tmp 文件夹) -->
            <description>A base for other temporary directories.</description>
    </property>
    <property>
            <name>fs.default.name</name>
            <value>hdfs://10.0.0.67:9000</value>
    <!-- hdfs://Master.Hadoop:22-->
            <final>true</final>
    </property>
    <property>
             <name>io.file.buffer.size</name>
             <value>131072</value>
    </property>
    
    </configuration>
    

    ⑤配置hdfs-site.xml文件

    vim /usr/hadoop/etc/hadoop/hdfs-site.xml
    
    <configuration>
    <property> 
    		<name>dfs.replication</name> 
    		<value>2</value> 
    	</property> 
    	<property> 
    		<name>dfs.name.dir</name> 
    		<value>/usr/hadoop/hdfs/name</value> 
    	</property> 
    	<property> 
    		<name>dfs.data.dir</name> 
    		<value>/usr/hadoop/hdfs/data</value> 
    	</property> 
    	<property>  
    		 <name>dfs.namenode.secondary.http-address</name>  
    		 <value>master.hadoop:9001</value>  
    	</property>  
    	<property>  
    		 <name>dfs.webhdfs.enabled</name>  
    		 <value>true</value>  
    	</property>  
    	<property>  
    		 <name>dfs.permissions</name>  
    		 <value>false</value>  
    	</property>  
    </configuration>
    

    ⑥配置mapred-site.xml文件

    vim /usr/hadoop/etc/hadoop/mapred-site.xml
    
    <configuration>
            <property>
                      <name>mapreduce.framework.name</name>
                      <value>yarn</value>
            </property>
    </configuration>
    

    ⑦配置yarn-site.xml文件

    vim /usr/hadoop/etc/hadoop/yarn-site.xml
    
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    <property>  
      <name>yarn.resourcemanager.address</name>  
      <value>Master.Hadoop:18040</value>  
    </property>  
    <property>  
      <name>yarn.resourcemanager.scheduler.address</name>  
      <value>Master.Hadoop:18030</value>  
    </property>  
    <property>  
      <name>yarn.resourcemanager.webapp.address</name>  
      <value>Master.Hadoop:18088</value>  
    </property>  
    <property>  
      <name>yarn.resourcemanager.resource-tracker.address</name>  
      <value>Master.Hadoop:18025</value>  
    </property>  
    <property>  
      <name>yarn.resourcemanager.admin.address</name>  
      <value>Master.Hadoop:18141</value>  
    </property>  
    <property>  
      <name>yarn.nodemanager.aux-services</name>  
      <value>mapreduce_shuffle</value>  
    </property>  
    <property>  
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>  
    </property>  
    </configuration>
    

    ⑧配置masters、slaves文件

    echo "10.0.0.67" >/usr/hadoop/etc/hadoop/masters
    echo -e "10.0.0.68
    10.0.0.69" >/usr/hadoop/etc/hadoop/slaves

    查看

    [root@Master hadoop]# cat /usr/hadoop/etc/hadoop/masters
    10.0.0.67
    [root@Master hadoop]# cat /usr/hadoop/etc/hadoop/slaves 
    10.0.0.68
    10.0.0.69

    3、Slave服务器安装及配置 

    ①拷贝jdk到Slave

    scp -rp /usr/local/jdk1.8.0_181 root@Slave1.Hadoop:/usr/local/
    scp -rp /usr/local/jdk1.8.0_181 root@Slave2.Hadoop:/usr/local/

    ②拷贝环境变量/etc/profile

    scp -rp /etc/profile root@Slave1.Hadoop:/etc/
    scp -rp /etc/profile root@Slave2.Hadoop:/etc/

    ③拷贝/usr/hadoop

    scp -rp /usr/hadoop root@Slave1.Hadoop:/usr/
    scp -rp /usr/hadoop root@Slave2.Hadoop:/usr/

    到此环境搭建完毕

    五、启动及验证Hadoop集群

    1、启动

    ①格式化HDFS文件系统

    /usr/hadoop/sbin/hadoop namenode –format

    ②启动Hadoop集群所有节点

    sh /usr/hadoop/sbin/start-all.sh

    查看hadoop进程

    [root@Master sbin]# ps -ef|grep hadoop
    root       1523      1  3 16:37 ?        00:00:07 /usr/local/jdk1.8.0_181/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.library.path=/usr/hadoop/lib -Dhadoop.log.dir=/usr/hadoop/logs -Dhadoop.log.file=hadoop-root-secondarynamenode-Master.Hadoop.log -Dhadoop.home.dir=/usr/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
    root       1670      1 11 16:37 pts/0    00:00:19 /usr/local/jdk1.8.0_181/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.home.dir= -Dyarn.id.str=root -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.home.dir=/usr/hadoop -Dhadoop.home.dir=/usr/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -classpath /usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/share/hadoop/common/lib/*:/usr/hadoop/share/hadoop/common/*:/usr/hadoop/share/hadoop/hdfs:/usr/hadoop/share/hadoop/hdfs/lib/*:/usr/hadoop/share/hadoop/hdfs/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/mapreduce/lib/*:/usr/hadoop/share/hadoop/mapreduce/*:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
    root       1941   1235  0 16:40 pts/0    00:00:00 grep --color=auto hadoop

    ③关闭Hadoop集群所有节点

    sh /usr/hadoop/sbin/stop-all.sh

    Slave1.Hadoop Slave2.Hadoop查看hadoop进程

    [root@Slave1 ~]# ps -ef|grep hadoop
    root       1271      1  2 16:37 ?        00:00:12 /usr/local/jdk1.8.0_181/bin/java -Dproc_datanode -Xmx1000m -Djava.library.path=/usr/hadoop/lib -Dhadoop.log.dir=/usr/hadoop/logs -Dhadoop.log.file=hadoop-root-datanode-Slave1.Hadoop.log -Dhadoop.home.dir=/usr/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
    root       1363      1  4 16:37 ?        00:00:19 /usr/local/jdk1.8.0_181/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.home.dir= -Dyarn.id.str=root -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.home.dir=/usr/hadoop -Dhadoop.home.dir=/usr/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -classpath /usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/share/hadoop/common/lib/*:/usrhadoop/share/hadoop/common/*:/usr/hadoop/share/hadoop/hdfs:/usr/hadoop/share/hadoop/hdfs/lib/*:/usr/hadoop/share/hadoop/hdfs/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/mapreduce/lib/*:/usr/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager
    root       1499   1238  0 16:45 pts/0    00:00:00 grep --color=auto hadoop

     2、使用jps命令测试

    ①Master

    [root@Master ~]# jps
    11329 NameNode
    11521 SecondaryNameNode
    12269 Jps
    11677 ResourceManager

    ②Slave

    [root@Slave1 ~]# jps
    4320 Jps
    4122 NodeManager
    4012 DataNode

    3、Master上面查看集群状态

    [root@Master ~]# hadoop dfsadmin -report
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.
    
    18/08/08 11:16:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Configured Capacity: 38020816896 (35.41 GB)
    Present Capacity: 27476373504 (25.59 GB)
    DFS Remaining: 27476316160 (25.59 GB)
    DFS Used: 57344 (56 KB)
    DFS Used%: 0.00%
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    Pending deletion blocks: 0
    
    -------------------------------------------------
    Live datanodes (2):
    
    Name: 10.0.0.68:50010 (Slave1.Hadoop)
    Hostname: Slave1.Hadoop
    Decommission Status : Normal
    Configured Capacity: 19010408448 (17.70 GB)
    DFS Used: 28672 (28 KB)
    Non DFS Used: 4270084096 (3.98 GB)
    DFS Remaining: 13767794688 (12.82 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 72.42%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Wed Aug 08 11:16:09 CST 2018
    
    
    Name: 10.0.0.69:50010 (Slave2.Hadoop)
    Hostname: Slave2.Hadoop
    Decommission Status : Normal
    Configured Capacity: 19010408448 (17.70 GB)
    DFS Used: 28672 (28 KB)
    Non DFS Used: 4329357312 (4.03 GB)
    DFS Remaining: 13708521472 (12.77 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 72.11%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Wed Aug 08 11:16:08 CST 2018

    4、通过web页面查看集群状态

    http://10.0.0.67:50070
    

    成功最有效的方法就是向有经验的人学习!
  • 相关阅读:
    视频智能云组网EasyNTS中sqlite和mysql数据库如何进行相互切换?
    关于github上提出EasyRTSPLive视频网关编译过程中修复README错误
    IPC拉转推流场景中如何实现视频网关EasyRTSPLive试用多通道转换?
    如何使用流媒体接入网关实现拉RTSP流转推RTMP流到流媒体服务器?
    视频流媒体平台采用Go语言编程ioutil.ReadAll的用法注意点
    视频流媒体直播平台EasyDSS运行报Only one usage错误原因排查分析
    视频流媒体播放器EasyPlayer.js截取base64编码快照显示不完整问题解决
    视频流媒体RTMP推流组件在Chorme浏览器无法播放FLV匿名直播流问题分析
    视频流媒体直播点播平台如何获取视频转码信息和进度?
    部署国标GB28181流媒体服务器EasyGBS成功后无法播放视频问题步骤排查
  • 原文地址:https://www.cnblogs.com/yanxinjiang/p/9437964.html
Copyright © 2011-2022 走看看