zoukankan      html  css  js  c++  java
  • Hadoop 2.2.0 在Red Hat Enterprise Linux 6.1 上的分布式配置(VMware虚拟机,1个namenode,2个datanode)

    这两天参考了一批文章,终于把Hadoop配置好了!!!

    参考:

    http://blog.csdn.net/licongcong_0224/article/details/12972889
    http://www.ituring.com.cn/article/63927
    http://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html

    1、虚拟机情况

    3台,每个1G内存,20G硬盘,均为NAT网络,均有VMware Tools

    IP:

    192.168.220.131 hadoop0
    192.168.220.133 hadoop1
    192.168.220.134 hadoop2

    账户密码设置:mlx/123456

    (使用root账户也是可以的,不过不安全,最好使用普通账户)

    注意:关闭防火墙(需root权限)

    service iptables stop
    chkconfig iptables off

    2、改hosts

    先使用ifconfig查看每个机器的ip

    [mlx@hadoop0 sbin]$ ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:0C:29:FC:94:01  
              inet addr:192.168.220.131  Bcast:192.168.220.255  Mask:255.255.255.0
              inet6 addr: fe80::20c:29ff:fefc:9401/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:12741 errors:0 dropped:0 overruns:0 frame:0
              TX packets:8027 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:2209242 (2.1 MiB)  TX bytes:862255 (842.0 KiB)
              Interrupt:19 Base address:0x2024 
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:2666 errors:0 dropped:0 overruns:0 frame:0
              TX packets:2666 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:4520281 (4.3 MiB)  TX bytes:4520281 (4.3 MiB)

    可以看到hadoop0的ip为192.168.220.131

    同理,查hadoop1,hadoop2的ip

    然后将

    192.168.220.131 hadoop0
    192.168.220.133 hadoop1
    192.168.220.134 hadoop2

    放入每台机器的/etc/hosts 文件的末尾(注意:这个需要用root账户)

    3、在每台机器上安装JDK

    这里看我的另一篇文章即可:

    http://www.cnblogs.com/xysmlx/p/3551619.html

    4、master与slave之间的SSH互联

    4.1、master向slave的SSH互联

    先用以下命令生成rsa密钥

    ssh-keygen -t rsa -P ''
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    然后修改authorized_keys的权限(非常重要,否则无法进行ssh连接)

    chmod 600 ~/.ssh/authorized_keys

    可以尝试ssh localhost登陆,如果不需要密码,则目前成功

    mlx@hadoop0 sbin]$ ssh localhost
    The authenticity of host 'localhost (::1)' can't be established.
    RSA key fingerprint is 43:3d:d0:2c:13:de:b1:c4:da:72:34:ba:c9:a3:a2:64.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
    Last login: Mon Feb 17 15:01:01 2014 from hadoop1
    [mlx@hadoop0 ~]$ 

    然后将rsa文件拷到另外两台机器上面

    scp ~/.ssh/id_rsa.pub mlx@192.168.220.133:~/
    scp ~/.ssh/id_rsa.pub mlx@192.168.220.134:~/

    然后在另外两台机器上面分别执行以下命令:(注意:权限修改chmod非常重要)

    chmod 700 ~/.ssh
    cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
    chmod 600 ~/.ssh/authorized_keys

    然后用命令测试ssh:ssh 用户@IP

    [mlx@hadoop0 ~]$ ssh mlx@hadoop1
    Last login: Mon Feb 17 13:52:52 2014 from hadoop0
    [mlx@hadoop1 ~]$ 

    如果不需密码,则成功

    4.2、slave向master的ssh连接

    同4.1,将4.1做的反过来即可

    注意:必须进行权限修改chmod,否则会失败

    测试:

    [mlx@hadoop1 ~]$ ssh mlx@hadoop0
    Last login: Mon Feb 17 16:13:56 2014 from localhost
    [mlx@hadoop0 ~]$ 

    5、Hadoop的安装与配置

    这里是把hadoop在master(hadoop0)上面配置好,然后复制到hadoop1和hadoop2上面

    5.1、Hadoop的安装

    先从hadoop官网下载hadoop文件hadoop-2.2.0.tar.gz

    然后将hadoop-2.2.0.tar.gz复制到/usr中

    然后解压hadoop-2.2.0.tar.gz

    tar -zxf /usr/hadoop-2.2.0.tar.gz

    将解压好的hadoop-2.2.0文件夹改名为hadoop

    然后在/etc/profile末尾添加以下内容来改环境变量:(root账户)

    # set hadoop path
    export HADOOP_HOME=/usr/hadoop
    export PATH=$HADOOP_HOME/bin:$PATH

    然后重启profile

    source /etc/profile

    5.2、Hadoop的配置

    注意5.2.4-5.2.7都是在<configuration></configuration>之间插入代码

    5.2.2-5.2.7文件均在/usr/hadoop/etc/hadoop下

    5.2.1、创建文件夹

    mkdir /usr/hadoop/tmp
    mkdir /usr/hadoop/dfs
    mkdir /usr/hadoop/dfs/name
    mkdir /usr/hadoop/dfs/data

    5.2.2、hadoop-env.sh

    修改

    export JAVA_HOME=/usr/java/jdk1.7.0_51

    5.2.3、yarn-env.sh

    修改

    if [ "$JAVA_HOME" != "" ]; then
      #echo "run java in $JAVA_HOME"
      JAVA_HOME=/usr/java/jdk1.7.0_51
    fi

    5.2.4、core-site.xml

    <configuration>
                    <property>
                                    <name>fs.defaultFS</name>
                                    <value>hdfs://hadoop0:9000</value>
                    </property>
           <property>
                                    <name>io.file.buffer.size</name>
                                    <value>131072</value>
                    </property>
           <property>
                                    <name>hadoop.tmp.dir</name>
                                    <value>file:/usr/hadoop/tmp</value>
                                    <description>Abase for other temporary directories.</description>
                    </property>
            <property>
                   <name>hadoop.proxyuser.mlx.hosts</name>
                   <value>*</value>
           </property>
                     <property>
                   <name>hadoop.proxyuser.mlx.groups</name>
                   <value>*</value>
           </property>
    </configuration>

    5.2.5、hdfs-site.xml

    <configuration>
           <property>
                    <name>dfs.namenode.secondary.http-address</name>
                   <value>hadoop0:9001</value>
            </property>
             <property>
                      <name>dfs.namenode.name.dir</name>
                     <value>file:/usr/hadoop/dfs/name</value>
                </property>
               <property>
                        <name>dfs.datanode.data.dir</name>
                        <value>file:/usr/hadoop/dfs/data</value>
                </property>
                <property>
                         <name>dfs.replication</name>
                         <value>1</value>
                 </property>
                 <property>
                         <name>dfs.webhdfs.enabled</name>
                         <value>true</value>
             </property>
    </configuration>

    5.2.6、mapred-site.xml

    <configuration>
                    <property>
                                    <name>mapreduce.framework.name</name>
                                    <value>yarn</value>
                    </property>
                    <property>
                                    <name>mapreduce.jobhistory.address</name>
                                    <value>hadoop0:10020</value>
                    </property>
                    <property>
                   <name>mapreduce.jobhistory.webapp.address</name>
                   <value>hadoop0:19888</value>
           </property>
    </configuration>

    5.2.7、yarn-site.xml

    <configuration>
                     <property>
                   <name>yarn.nodemanager.aux-services</name>
                   <value>mapreduce_shuffle</value>
            </property>
                     <property>
                   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.address</name>
                   <value>hadoop0:8032</value>
           </property>
                    <property>
                   <name>yarn.resourcemanager.scheduler.address</name>
                   <value>hadoop0:8030</value>
                   </property>
                   <property>
                           <name>yarn.resourcemanager.resource-tracker.address</name>
                            <value>hadoop0:8031</value>
                   </property>
                   <property>
                           <name>yarn.resourcemanager.admin.address</name>
                            <value>hadoop0:8033</value>
                   </property>
                    <property>
                   <name>yarn.resourcemanager.webapp.address</name>
                   <value>hadoop0:8088</value>
           </property>
    </configuration>

    5.3、将hadoop0配置的hadoop文件夹复制到hadoop1和hadoop2上

    scp -r /usr/hadoop root@服务器IP:/usr/

    即:

    scp -r /usr/hadoop root@hadoop1:/usr/
    scp -r /usr/hadoop root@hadoop2:/usr/

    5.4、改hadoop0上的slaves文件

    localhost

    改为

    192.168.220.133
    192.168.220.134

    5.5、改权限

    每一个机器都要做

    chown -R mlx:hadoop hadoop

    以及

    chmod g-w /usr/hadoop

    6、启动Hadoop

    6.1、格式化HDFS

    hadoop namenode -format

    可以看到最后几行为:

    14/02/17 15:54:10 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
    Formatting using clusterid: CID-630c8102-043a-46ca-b9dd-c2c12a96965d
    14/02/17 15:54:11 INFO namenode.HostFileManager: read includes:
    HostSet(
    )
    14/02/17 15:54:11 INFO namenode.HostFileManager: read excludes:
    HostSet(
    )
    14/02/17 15:54:11 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
    14/02/17 15:54:11 INFO util.GSet: Computing capacity for map BlocksMap
    14/02/17 15:54:11 INFO util.GSet: VM type       = 32-bit
    14/02/17 15:54:11 INFO util.GSet: 2.0% max memory = 966.7 MB
    14/02/17 15:54:11 INFO util.GSet: capacity      = 2^22 = 4194304 entries
    14/02/17 15:54:11 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
    14/02/17 15:54:11 INFO blockmanagement.BlockManager: defaultReplication         = 1
    14/02/17 15:54:11 INFO blockmanagement.BlockManager: maxReplication             = 512
    14/02/17 15:54:11 INFO blockmanagement.BlockManager: minReplication             = 1
    14/02/17 15:54:11 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
    14/02/17 15:54:11 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
    14/02/17 15:54:11 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
    14/02/17 15:54:11 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
    14/02/17 15:54:11 INFO namenode.FSNamesystem: fsOwner             = mlx (auth:SIMPLE)
    14/02/17 15:54:11 INFO namenode.FSNamesystem: supergroup          = supergroup
    14/02/17 15:54:11 INFO namenode.FSNamesystem: isPermissionEnabled = true
    14/02/17 15:54:11 INFO namenode.FSNamesystem: HA Enabled: false
    14/02/17 15:54:11 INFO namenode.FSNamesystem: Append Enabled: true
    14/02/17 15:54:11 INFO util.GSet: Computing capacity for map INodeMap
    14/02/17 15:54:11 INFO util.GSet: VM type       = 32-bit
    14/02/17 15:54:11 INFO util.GSet: 1.0% max memory = 966.7 MB
    14/02/17 15:54:11 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    14/02/17 15:54:11 INFO namenode.NameNode: Caching file names occuring more than 10 times
    14/02/17 15:54:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
    14/02/17 15:54:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
    14/02/17 15:54:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
    14/02/17 15:54:11 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
    14/02/17 15:54:11 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
    14/02/17 15:54:11 INFO util.GSet: Computing capacity for map Namenode Retry Cache
    14/02/17 15:54:11 INFO util.GSet: VM type       = 32-bit
    14/02/17 15:54:11 INFO util.GSet: 0.029999999329447746% max memory = 966.7 MB
    14/02/17 15:54:11 INFO util.GSet: capacity      = 2^16 = 65536 entries
    14/02/17 15:54:11 INFO common.Storage: Storage directory /usr/hadoop/dfs/name has been successfully formatted.
    14/02/17 15:54:11 INFO namenode.FSImage: Saving image file /usr/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
    14/02/17 15:54:11 INFO namenode.FSImage: Image file /usr/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 195 bytes saved in 0 seconds.
    14/02/17 15:54:11 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    14/02/17 15:54:11 INFO util.ExitUtil: Exiting with status 0
    14/02/17 15:54:11 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop0/192.168.220.131
    ************************************************************/

    6.2、启动hadoop

    到/usr/hadoop/sbin下,输入

    ./start-all.sh
    [mlx@hadoop0 sbin]$ ./start-all.sh
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [hadoop0]
    hadoop0: starting namenode, logging to /usr/hadoop/logs/hadoop-mlx-namenode-hadoop0.out
    192.168.220.134: starting datanode, logging to /usr/hadoop/logs/hadoop-mlx-datanode-hadoop2.out
    192.168.220.133: starting datanode, logging to /usr/hadoop/logs/hadoop-mlx-datanode-hadoop1.out
    Starting secondary namenodes [hadoop0]
    hadoop0: starting secondarynamenode, logging to /usr/hadoop/logs/hadoop-mlx-secondarynamenode-hadoop0.out
    starting yarn daemons
    starting resourcemanager, logging to /usr/hadoop/logs/yarn-mlx-resourcemanager-hadoop0.out
    192.168.220.134: starting nodemanager, logging to /usr/hadoop/logs/yarn-mlx-nodemanager-hadoop2.out
    192.168.220.133: starting nodemanager, logging to /usr/hadoop/logs/yarn-mlx-nodemanager-hadoop1.out

    6.3、查看是否启动

    在hadoop0下输入jps

    [mlx@hadoop0 sbin]$ jps
    11696 Jps
    11140 NameNode
    11450 ResourceManager
    11315 SecondaryNameNode

    在hadoop1下输入jps

    [mlx@hadoop1 ~]$ jps
    6917 Jps
    6168 NodeManager
    6062 DataNode
    [mlx@hadoop1 ~]$ 

    在hadoop2下输入jps

    [mlx@hadoop2 ~]$ jps
    6536 DataNode
    6742 Jps
    6641 NodeManager
    [mlx@hadoop2 ~]$ 

    在hadoop0下输入hadoop dfsadmin -report

    [mlx@hadoop0 sbin]$ hadoop dfsadmin -report
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.
    
    Configured Capacity: 37073182720 (34.53 GB)
    Present Capacity: 28097224704 (26.17 GB)
    DFS Remaining: 28097175552 (26.17 GB)
    DFS Used: 49152 (48 KB)
    DFS Used%: 0.00%
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    
    -------------------------------------------------
    Datanodes available: 2 (2 total, 0 dead)
    
    Live datanodes:
    Name: 192.168.220.134:50010 (hadoop2)
    Hostname: hadoop2
    Decommission Status : Normal
    Configured Capacity: 18536591360 (17.26 GB)
    DFS Used: 24576 (24 KB)
    Non DFS Used: 4417183744 (4.11 GB)
    DFS Remaining: 14119383040 (13.15 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 76.17%
    Last contact: Mon Feb 17 15:55:36 CST 2014
    
    
    Name: 192.168.220.133:50010 (hadoop1)
    Hostname: hadoop1
    Decommission Status : Normal
    Configured Capacity: 18536591360 (17.26 GB)
    DFS Used: 24576 (24 KB)
    Non DFS Used: 4558774272 (4.25 GB)
    DFS Remaining: 13977792512 (13.02 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 75.41%
    Last contact: Mon Feb 17 15:55:36 CST 2014

    7、维护Hadoop

    7.1、重新格式化HDFS

    先到hadoop0的/usr/hadoop/sbin下执行stop-all.sh

    然后在三台机器上输入:

    rm -rf /usr/hadoop/tmp
    rm -rf /usr/hadoop/dfs
    mkdir /usr/hadoop/tmp
    mkdir /usr/hadoop/dfs
    mkdir /usr/hadoop/dfs/name
    mkdir /usr/hadoop/dfs/data
    rm -rf /tmp/hadoop*

    然后在hadoop0上执行

    hadoop namenode -format

    7.2、重启计算机后打开hadoop

    到/usr/hadoop/sbin下执行start-all.sh

    7.3、关闭hadoop

    到/usr/hadoop/sbin下执行stop-all.sh

    7.4、查看日志(最好的排错方法)

    日志在/usr/hadoop/logs下,开.log文件即可看日志信息

    7.5、结束任务

    hadoop job -kill jobid

    7.6、执行任务

    hadoop jar matrix.jar MartrixMultiplication /input/M.data /output
  • 相关阅读:
    Linux系统_Linux平台“盖茨木马”初步了解
    查杀病毒的NB命令
    rabbitmq 常用的一些命令
    date 修改系统时间
    mkpasswd
    关于haproxy负载均衡的算法整理
    MySQL数据表中内容大小写区分的设置
    查看某个端口的连接数
    rabbitmq
    mysqldump 报导常
  • 原文地址:https://www.cnblogs.com/xysmlx/p/3552847.html
Copyright © 2011-2022 走看看