zoukankan      html  css  js  c++  java
  • 部署Hadoop2.0高性能集群

    废话不多说直接实战,部署Hadoop高性能集群:

    拓扑图:

    一、实验前期环境准备:

    1、三台主机配置hosts文件:(复制到另外两台主机上)

    [root@tiandong63 ~]# more /etc/hosts  
    192.168.199.3 tiandong63
    192.168.199.4 tiandong64
    192.168.199.5 tiandong65

    2、创建Hadoop账号(另外两台主机上都的创建)

    [root@tiandong63 ~]#useradd -u 8000 hadoop

    [root@tiandong63 ~]#echo '123456' | passwd --stdin hadoop

    3、给hadoop用户增加sudo权限,增加内容(另外两台主机上都得配置)

    [root@tiandong63 ~]# vim /etc/sudoers
    hadoop  ALL=(ALL)       ALL
    4、主机互信(在tiandong63主机上)
    可以ssh无密码登录机器tiandong63,tiandong64,tiandong65 ,方便后期复制文件和启动服务。因为namenode启动时,会连接到datanode上启动对应的服务。

    [root@tiandong63 ~]# su - hadoop
    [hadoop@tiandong63 ~]$ ssh-keygen
    [hadoop@tiandong63 ~]$ ssh-copy-id root@192.168.199.4
    [hadoop@tiandong63 ~]$ ssh-copy-id root@192.168.199.5

    二、配置hadoop环境

    1、配置jdk环境:(三台都的配置)

    [root@tiandong63 ~]# ll jdk-8u191-linux-x64.tar.gz
    -rw-r--r-- 1 root root 191753373 Dec 30 00:58 jdk-8u191-linux-x64.tar.gz

    [root@tiandong63 ~]# tar zxvf jdk-8u191-linux-x64.tar.gz -C /usr/local/src/

    [root@tiandong63 ~]# vim /etc/profile
    export JAVA_HOME=/usr/local/src/jdk1.8.0_191
    export JAVA_BIN=/usr/local/src/jdk1.8.0_191/bin
    export PATH=${JAVA_HOME}/bin:$PATH
    export CLASSPATH=.:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
    export hadoop_root_logger=DEBUG,console
    [root@tiandong63 ~]# source /etc/profile
    2、关闭防火墙(三台都的关闭)
    [root@tiandong63 ~]# /etc/init.d/iptables stop
    3、在tiandong63安装Hadoop 并配置成namenode主节点
    [root@tiandong63 ~]# cd /home/hadoop/
    [root@tiandong63 hadoop]# ll hadoop-2.7.7.tar.gz
    -rw-r--r-- 1 hadoop hadoop 218720521 Dec 29 21:11 hadoop-2.7.7.tar.gz
    [root@tiandong63 ~]# su - hadoop
    [hadoop@tiandong63 ~]$ tar -zxvf hadoop-2.7.7.tar.gz
    创建hadoop相关的工作目录:
    [hadoop@tiandong63 ~]$ mkdir -p /home/hadoop/dfs/name/ /home/hadoop/dfs/data/^Chome/hadoop/tmp/
    4、配置hadoop(修改7个配置文件)

    文件名称:hadoop-env.sh、yarn-evn.sh、slaves、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml

    [hadoop@tiandong63 ~]$ cd /home/hadoop/hadoop-2.7.7/etc/hadoop/
    [hadoop@tiandong63 hadoop]$ vim hadoop-env.sh     指定hadoop的Java运行环境
     25 export JAVA_HOME=/usr/local/src/jdk1.8.0_191
    [hadoop@tiandong63 hadoop]$ vim yarn-env.sh    指定yarn框架的java运行环境
     26   JAVA_HOME=/usr/local/src/jdk1.8.0_191
    [hadoop@tiandong63 hadoop]$ vim slaves        指定datanode 数据存储服务器
    tiandong64
    tiandong65

    [hadoop@tiandong63 hadoop]$ vim core-site.xml    指定访问hadoop web界面访问路径
    19 <configuration>
     20  <property>
     21       <name>fs.defaultFS</name>
     22       <value>hdfs://tiandong63:9000</value>
     23  </property>
     24
     25  <property>
     26       <name>io.file.buffer.size</name>
     27       <value>131072</value>
     28  </property>
     29
     30  <property>
     31       <name>hadoop.tmp.dir</name>
     32       <value>file:/home/hadoop/tmp</value>
     33       <description>Abase for other temporary directories.</description>
     34  </property>
     35 </configuration>
    [hadoop@tiandong63 hadoop]$ mkdir -p /home/hadoop/tmp

    [hadoop@tiandong63 hadoop]$ vim hdfs-site.xml

    hdfs的配置文件,dfs.http.address配置了hdfs的http的访问位置,dfs.replication配置了文件块的副本数,一般不大于从机的个数。
    19 <configuration>
     20  <property>
     21       <name>dfs.namenode.secondary.http-address</name>
     22       <value>tiandong63:9001</value>
     23  </property>
     24
     25  <property>
     26         <name>dfs.namenode.name.dir</name>
     27         <value>file:/home/hadoop/dfs/name</value>
     28  </property>
     29  <property>
     30         <name>dfs.datanode.data.dir</name>
     31         <value>file:/home/hadoop/dfs/data</value>
     32  </property>
     33
     34  <property>
     35      <name>dfs.replication</name>
     36      <value>2</value>
     37  </property>
     38
     39  <property>
     40      <name>dfs.webhdfs.enabled</name>
     41      <value>true</value>
     42  </property>
     43 </configuration>
    [hadoop@tiandong63 hadoop]$ vim mapred-site.xml
     19 <configuration>
     20  <property>
     21     <name>mapreduce.framework.name</name>
     22     <value>yarn</value>
     23  </property>
     24
     25  <property>
     26       <name>mapreduce.jobhistory.address</name>
     27       <value>tiandong63:10020</value>
     28  </property>
     29
     30  <property>
     31       <name>mapreduce.jobhistory.webapp.address</name>
     32       <value>tiandong:19888</value>
     33  </property>
     34 </configuration>
    [hadoop@tiandong63 hadoop]$ vim yarn-site.xml    该文件为yarn框架的配置,主要是一些任务的启动位置
     15 <configuration>
     16
     17 <!-- Site specific YARN configuration properties -->
     18 <property>
     19      <name>yarn.nodemanager.aux-services</name>
     20           <value>mapreduce_shuffle</value>
     21            </property>
     22
     23  <property>
     24       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
     25            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
     26             </property>
     27
     28  <property>
     29      <name>yarn.resourcemanager.address</name>
     30          <value>tiandong63:8032</value>
     31           </property>
     32
     33  <property>
     34       <name>yarn.resourcemanager.scheduler.address</name>
     35                <value>tiandong63:8030</value>
     36                 </property>
     37
     38  <property>
     39       <name>yarn.resourcemanager.resource-tracker.address</name>
     40            <value>tiandong63:8031</value>
     41             </property>
     42
     43  <property>
     44      <name>yarn.resourcemanager.admin.address</name>
     45          <value>tiandong63:8033</value>
     46           </property>
     47
     48  <property>
     49      <name>yarn.resourcemanager.webapp.address</name>
     50          <value>tiandong63:8088</value>
     51           </property>
     52 </configuration>

    复制到其他datanode节点(192.168.199.4/5)
    [hadoop@tiandong63 ~]$ scp -r /home/hadoop/hadoop-2.7.7 hadoop@tiandong64
    [hadoop@tiandong63 ~]$ scp -r /home/hadoop/hadoop-2.7.7 hadoop@tiandong65

    三、启动hadoop:
    在tiandong63上面启动(使用hadoop用户)
    1、格式化:
    [hadoop@tiandong63 ~]$ cd /home/hadoop/hadoop-2.7.7/bin/
    [hadoop@tiandong63 bin]$ ./hdfs nodename -format
    2、启动hdfs:
    [hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/sbin/start-dfs.sh
    查看进程(tiandong63上面查看namenode):
    [hadoop@tiandong63 ~]$ ps -axu | grep namenode --color
    Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
    hadoop    42851  0.4 17.6 2762196 177412 ?      Sl   18:13   1:26 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop-hadoop-namenode-tiandong63.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode
    hadoop    43046  0.2 13.3 2733336 134344 ?      Sl   18:13   0:35 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop-hadoop-secondarynamenode-tiandong63.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
    在tiandong64和tiandong65上面查看进程(datanode)
    [root@tiandong64 ~]# ps -aux|grep datanode
    Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
    hadoop     3938  0.3 12.2 2757576 122592 ?      Sl   18:13   1:04 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop-hadoop-datanode-tiandong64.log -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
    3、启动yarn(在tiandong63上面,即启动分布式计算):
    [hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/sbin/start-yarn.sh
    在tiandong63上面查看进程(resourcemanager进程)
    [hadoop@tiandong63 ~]$ ps -ef|grep resourcemanager --color
    hadoop    43196      1  0 18:14 pts/1    00:02:55 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-tiandong63.log -Dyarn.log.file=yarn-hadoop-resourcemanager-tiandong63.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=yarn-hadoop-resourcemanager-tiandong63.log -Dyarn.log.file=yarn-hadoop-resourcemanager-tiandong63.log -Dyarn.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -classpath /home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/common/*:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.7.7/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
    在tiandong64和tiandong65上面查看进程(nodemanager)
    [root@tiandong64 ~]# ps -aux|grep nodemanager --color
    Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
    hadoop     4048  0.5 15.7 2802552 158380 ?      Sl   18:14   1:42 /usr/local/src/jdk1.8.0_191/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=yarn-hadoop-nodemanager-tiandong64.log -Dyarn.log.file=yarn-hadoop-nodemanager-tiandong64.log -Dyarn.home.dir= -Dyarn.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dyarn.log.dir=/home/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=yarn-hadoop-nodemanager-tiandong64.log -Dyarn.log.file=yarn-hadoop-nodemanager-tiandong64.log -Dyarn.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.home.dir=/home/hadoop/hadoop-2.7.7 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/hadoop-2.7.7/lib/native -classpath /home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/etc/hadoop:/home/hadoop/hadoop-2.7.7/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/common/*:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/*:/home/hadoop/hadoop-2.7.7/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.7.7/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager
    注意:start-dfs和start-yarn.sh这两个脚本可以使用start-all.sh代替

    4、查看HDFS分布式文件系统状态
    [hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hdfs dfsadmin -report

    四、hadoop的简单使用

    运行hadoop计算任务,word count字数统计

    在HDFS上创建文件夹:

    [hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop fs -mkdir /test/input
    查看文件夹:

    [hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop fs -ls /test/
    drwxr-xr-x   - hadoop supergroup          0 2018-12-30 21:40 /test/input
    上传文件:

    创建一个计算的文件:

    [hadoop@tiandong63 ~]$ more file1.txt
    welcome to beijing
    my name is thunder
    what is your name
    上传到HDFS的/test/input文件夹中

    [hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop fs -put /home/hadoop/file1.txt /test/input
    查看是否上传成功:

    [hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop fs -ls /test/input
    Found 1 items
    -rw-r--r--   2 hadoop supergroup         56 2018-12-30 21:40 /test/input/file1.txt

    计算:
    [hadoop@tiandong63 ~]$ /home/hadoop/hadoop-2.7.7/bin/hadoop jar /home/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /test/input /test/output
    查看运行结果:
    [hadoop@tiandong63 ~]$  /home/hadoop/hadoop-2.7.7/bin/hadoop fs -cat /test/output/part-r-00000
    beijing    1
    is    2
    my    1
    name    2
    thunder    1
    to    1
    welcome    1
    what    1
    your    1


  • 相关阅读:
    【leetcode】Spiral Matrix
    【leetcode】Permutations
    【leetcode】Search Insert Position
    【leetcode】Search for a Range
    文件名对目标文件夹可能过长
    协同过滤和简单SVD优化
    奇异值分解(SVD)和简单图像压缩
    PCA数据降维
    FP-growth高效频繁项集发现
    关联挖掘和Aprioir算法
  • 原文地址:https://www.cnblogs.com/winter1519/p/10201142.html
Copyright © 2011-2022 走看看