zoukankan      html  css  js  c++  java
  • Hadoop集群安装

    一、完全分布式模式的安装和配置的具体步骤:

        1.配置jdk;2.配置hosts文件3.建立hadoop运行账号4.配置ssh免密码连入

        5.下载并解压hadoop安装包6.配置namenode,修改site文件;7.配置hadoop-env.sh

        8.配置masters和slaves文件9.向各节点复制hadoop10.格式化namenode

        11.启动hadoop12.用jps检验各后台进程是否成功启动

      1.配置jdk,分别在各节点配置

     1 ----首先把压缩包解压出来---- 
     2 [root@localhost ~]# tar -zxvf jdk-7u9-linux-i586.tar.gz 
     3 
     4  ----修改目录名---- 
     5 [root@localhost ~]# mv jdk1.7.0_09 /jdk1.7
     6 
     7  ----在/etc/profile文件中添加下面几行---- 
     8 [root@localhost ~]# vi /etc/profile
     9 
    10 export JAVA_HOME=/jdk1.7
    11 export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
    12 export PATH=$JAVA_HOME/bin:$PATH
    13  
    14  ----验证是否已经成功安装jdk1.7----
    15 [root@localhost ~]# java -version
    16 java version "1.7.0_09"
    17 Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
    18 Java HotSpot(TM) Client VM (build 23.5-b02, mixed mode)

       2.配置hosts文件,分别在各节点配置

    1 [root@localhost 123]# cat /etc/hosts
    2 # Do not remove the following line, or various programs
    3 # that require network functionality will fail.
    4 127.0.0.1        localhost.localdomain localhost
    5 ::1        localhost6.localdomain6 localhost6
    6 
    7 192.168.1.151 node1
    8 192.168.1.152 node2
    9 192.168.1.153 node3

       3.建立hadoop运行账号,分别在各个节点进行配置

    1 [root@localhost ~]# useradd jack
    2 [root@localhost ~]# passwd jack
    3 Changing password for user jack.
    4 New UNIX password: 
    5 BAD PASSWORD: it is too short
    6 Retype new UNIX password: 
    7 passwd: all authentication tokens updated successfully.
    8 [root@localhost ~]# id jack
    9 uid=500(jack) gid=500(jack) groups=500(jack)

       4.配置ssh免密码连入

     1 [jack@node1 ~]$ ssh-keygen -t rsa
     2 Generating public/private rsa key pair.
     3 Enter file in which to save the key (/home/jack/.ssh/id_rsa): 
     4 Created directory '/home/jack/.ssh'.
     5 Enter passphrase (empty for no passphrase): 
     6 Enter same passphrase again: 
     7 Your identification has been saved in /home/jack/.ssh/id_rsa.
     8 Your public key has been saved in /home/jack/.ssh/id_rsa.pub.
     9 The key fingerprint is:
    10 65:22:5b:af:69:09:7b:8f:8b:35:f6:b8:69:8c:f0:a1 jack@node1
    11 
    12 [jack@node2 ~]$ ssh-keygen -t rsa
    13 Generating public/private rsa key pair.
    14 Enter file in which to save the key (/home/jack/.ssh/id_rsa): 
    15 Created directory '/home/jack/.ssh'.
    16 Enter passphrase (empty for no passphrase): 
    17 Enter same passphrase again: 
    18 Your identification has been saved in /home/jack/.ssh/id_rsa.
    19 Your public key has been saved in /home/jack/.ssh/id_rsa.pub.
    20 The key fingerprint is:
    21 ab:18:29:89:57:82:f8:cc:3c:ed:47:05:b2:15:43:56 jack@node2
    22 
    23 [jack@node3 ~]$ ssh-keygen -t rsa
    24 Generating public/private rsa key pair.
    25 Enter file in which to save the key (/home/jack/.ssh/id_rsa): 
    26 Created directory '/home/jack/.ssh'.
    27 Enter passphrase (empty for no passphrase): 
    28 Enter same passphrase again: 
    29 Your identification has been saved in /home/jack/.ssh/id_rsa.
    30 Your public key has been saved in /home/jack/.ssh/id_rsa.pub.
    31 The key fingerprint is:
    32 11:9f:7c:81:e2:dd:c8:44:1d:8a:24:15:28:bc:06:78 jack@node3
    33 
    34 [jack@node1 ~]$ cd .ssh/
    35 [jack@node1 .ssh]$ cat id_rsa.pub > authorized_keys
    36 
    37 [jack@node2 ~]$ cd .ssh/
    38 [jack@node2 .ssh]$ scp id_rsa.pub node1:/home/jack/
    39 The authenticity of host 'node1 (192.168.1.151)' can't be established.
    40 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64.
    41 Are you sure you want to continue connecting (yes/no)? yes
    42 Warning: Permanently added 'node1,192.168.1.151' (RSA) to the list of known hosts.
    43 jack@node1's password: 
    44 id_rsa.pub                                                                            100%  392     0.4KB/s   00:00  
    45 
    46 [jack@node1 .ssh]$ cat /home/jack/id_rsa.pub >> authorized_keys 
    47 
    48 [jack@node3 ~]$ cd .ssh/
    49 [jack@node3 .ssh]$ scp id_rsa.pub node1:/home/jack/
    50 The authenticity of host 'node1 (192.168.1.151)' can't be established.
    51 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64.
    52 Are you sure you want to continue connecting (yes/no)? yes
    53 Warning: Permanently added 'node1,192.168.1.151' (RSA) to the list of known hosts.
    54 jack@node1's password: 
    55 id_rsa.pub                                                                            100%  392     0.4KB/s   00:00  
    56 
    57 [jack@node1 .ssh]$ cat /home/jack/id_rsa.pub >> authorized_keys 
    58 
    59 [jack@node1 .ssh]$ ls
    60 authorized_keys  id_rsa  id_rsa.pub
    61 [jack@node1 .ssh]$ rm id_rsa.pub 
    62 [jack@node1 .ssh]$ scp authorized_keys node2:/home/jack/.ssh/
    63 The authenticity of host 'node2 (192.168.1.152)' can't be established.
    64 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64.
    65 Are you sure you want to continue connecting (yes/no)? yes
    66 Warning: Permanently added 'node2,192.168.1.152' (RSA) to the list of known hosts.
    67 jack@node2's password: 
    68 authorized_keys                                                                       100% 1176     1.2KB/s   00:00    
    69 [jack@node1 .ssh]$ scp authorized_keys node3:/home/jack/.ssh/
    70 The authenticity of host 'node3 (192.168.1.153)' can't be established.
    71 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64.
    72 Are you sure you want to continue connecting (yes/no)? yes
    73 Warning: Permanently added 'node3,192.168.1.153' (RSA) to the list of known hosts.
    74 jack@node3's password: 
    75 authorized_keys                                                                       100% 1176     1.2KB/s   00:00    
    76 [jack@node1 .ssh]$ chmod 400 authorized_keys 
    77 
    78 [jack@node2 .ssh]$ rm id_rsa.pub 
    
    79 [jack@node2 .ssh]$ chmod 400 authorized_keys 
    80 
    81 [jack@node3 .ssh]$ rm id_rsa.pub 
    82 [jack@node3 .ssh]$ chmod 400 authorized_keys 
    83 [jack@node3 .ssh]$ ssh node2
    84 The authenticity of host 'node2 (192.168.1.152)' can't be established.
    85 RSA key fingerprint is 51:ac:0e:ec:9c:ec:60:ac:53:19:20:bc:e4:a6:95:64.
    86 Are you sure you want to continue connecting (yes/no)? yes
    87 Warning: Permanently added 'node2,192.168.1.152' (RSA) to the list of known hosts.
    88 Last login: Wed May 15 21:57:50 2013 from 192.168.1.104
    89 [jack@node2 ~]$ 

       5.下载并解压hadoop安装包

    1 [jack@node1 ~]$ tar -zxvf hadoop-0.20.2.tar.gz 
    2 
    3 [root@node1 jack]# mv hadoop-0.20.2 /hadoop-0.20.2

      6.配置namenode,修改site文件

        core-site.xml:hadoop core的配置项,例如hdfs和mapreduce常用的i/o设置等。

        hdfs-site.xml:hadoop守护进程的配置项,包括namenode、辅助namenode和datanode等。

        mapred-site.xml:mapreduce守护进程的配置项,包括jobtracker和tasktracker。

     1 [jack@node1 conf]$ cat core-site.xml 
     2 <?xml version="1.0"?>
     3 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     4 
     5 <!-- Put site-specific property overrides in this file. -->
     6 
    
     ----fs.default.name 这是一个描述集群中namenode节点的URI(包括协议、主机名称、端口号),集群里面的每一台机器都需要知道namenode的地址。
    datanode节点会先在namenode上注册,这样它们的数据才可以被使用。独立的客户端程序通过这个URI跟datanode交互,以取得文件的块列表。----
    ----hadoop.tmp.dir是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配置namenode和datanode的存放位置,默认就放在这个路径中----
    7
    <configuration> 8 <property> 9 <name>fs.default.name</name> 10 <value>hdfs://192.168.1.151:9000</value> 11 </property> 12 <property> 13 <name>hadoop.tmp.dir</name> 14 <value>/temp</value> 15 </property> 16 </configuration> 17 [jack@node1 conf]$ cat hdfs-site.xml 18 <?xml version="1.0"?> 19 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 20 21 <!-- Put site-specific property overrides in this file. --> 22 ----<!--dfs.replication-它决定着系统里面的文件块的数据备份个数。对于一个实际的应用,它应该被设为3(这个数字并没有上限,但更多的备份可能并没有作用,而且会占用更多的空间)。
    少于三个的备份,可能会影响到数据的可靠性(系统故障时,也许会造成数据丢失)----
    ----<!-dfs.data.dir-这是datanode节点被指定要存储数据的本地文件系统路径。datanode节点上的这个路径没有必要完全相同,因为每台机器的环境很可能不一样的。
    但是如果每台机器上的这个路径都是统一配置的话,会使工作变得简单一些。默认的情况下,它的值hadoop.tmp.dir,这个路径只能用于测试的目的,因为它很可能丢失掉一些数据。所以,这个值最好还是被覆盖。
    ----<!--dfs.name.dir-这是namenode节点存储hadoop文件系统信息的本地系统路径。这个值只对namenode有效,datanode并不需要使用到它。上面对于/temp类型的警告,同样也适用于这里。----
    23
    <configuration> 24 <property> 25 <name>dfs.name.dir</name> 26 <value>/user/hdfs/name</value> 27 </property> 28 <property> 29 <name>dfs.data.dir</name> 30 <value>/user/hdfs/data</value> 31 </property> 32 <property> 33 <name>dfs.replication</name> 34 <value>2</value> 35 </property> 36 </configuration> 37 [jack@node1 conf]$ cat mapred-site.xml 38 <?xml version="1.0"?> 39 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 40 41 <!-- Put site-specific property overrides in this file. --> 42 43 <configuration> 44 <property> 45 <name>mapred.job.tracker</name> 46 <value>192.168.1.151:9001</value> 47 </property> 48 </configuration>
    1 [root@node1 ~]# mkdir /temp
    2 [root@node1 ~]# mkdir -p /user/hdfs/data
    3 [root@node1 ~]# mkdir -p /user/hdfs/name
    4 [root@node1 ~]# chown -R jack:jack /temp/
    5 [root@node1 ~]# chown -R jack:jack /user/

      7.配置hadoop-env.sh

       hadoop-env.sh:记录脚本要用的环境变量,以运行hadoop。

     1 [jack@node1 conf]$ vi hadoop-env.sh 
     2 
     3 # Set Hadoop-specific environment variables here.
     4 
     5 # The only required environment variable is JAVA_HOME.  All others are
     6 # optional.  When running a distributed configuration it is best to
     7 # set JAVA_HOME in this file, so that it is correctly defined on
     8 # remote nodes.
     9 
    10 # The java implementation to use.  Required.
    11  export JAVA_HOME=/jdk1.7
    12 
    13 # Extra Java CLASSPATH elements.  Optional.
    14 # export HADOOP_CLASSPATH=

       8.配置masters和slaves文件

        master:记录运行辅助namenode的机器列表。

        slave:记录运行datanode和tasktracker的机器列表。

    1 [root@node1 conf]# cat masters 
    2 192.168.1.151
    3 [root@node1 conf]# cat slaves 
    4 192.168.1.152
    5 192.168.1.153

       9.向各节点复制hadoop

    [root@node1 /]# scp -r hadoop-0.20.2 node3:/hadoop-0.20.2
    
    [root@node1 /]# scp -r hadoop-0.20.2 node2:/hadoop-0.20.2

       10.格式化namenode

     1 [jack@node1 /]$ cd hadoop-0.20.2/bin
     2 [jack@node1 bin]$ ./hadoop namenode -format
     3 13/05/05 20:02:03 INFO namenode.NameNode: STARTUP_MSG: 
     4 /************************************************************
     5 STARTUP_MSG: Starting NameNode
     6 STARTUP_MSG:   host = node1/192.168.1.151
     7 STARTUP_MSG:   args = [-format]
     8 STARTUP_MSG:   version = 0.20.2
     9 STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
    10 ************************************************************/
    11 Re-format filesystem in /user/hdfs/name ? (Y or N) Y
    12 13/05/05 20:02:06 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
    13 13/05/05 20:02:06 INFO namenode.FSNamesystem: supergroup=supergroup
    14 13/05/05 20:02:06 INFO namenode.FSNamesystem: isPermissionEnabled=true
    15 13/05/05 20:02:06 INFO common.Storage: Image file of size 94 saved in 0 seconds.
    16 13/05/05 20:02:06 INFO common.Storage: Storage directory /user/hdfs/name has been successfully formatted.
    17 13/05/05 20:02:06 INFO namenode.NameNode: SHUTDOWN_MSG: 
    18 /************************************************************
    19 SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.1.151
    20 ************************************************************/

       11.启动hadoop

    [jack@node1 bin]$ ./start-all.sh 
    starting namenode, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-namenode-node1.out
    192.168.1.153: starting datanode, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-datanode-node3.out
    192.168.1.152: starting datanode, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-datanode-node2.out
    192.168.1.151: starting secondarynamenode, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-node1.out
    starting jobtracker, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-node1.out
    192.168.1.152: starting tasktracker, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-node2.out
    192.168.1.153: starting tasktracker, logging to /hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-node3.out

       12.用jps检验各后台进程是否成功启动

     1 [jack@node1 bin]$ jps
     2 4375 NameNode
     3 4696 Jps
     4 4531 SecondaryNameNode
     5 4592 JobTracker
     6 
     7 [jack@node3 ~]$ jps
     8 4435 Jps
     9 4373 TaskTracker
    10 4275 DataNode
    11 
    12 [jack@node2 /]jack jps
    13 3934 TaskTracker
    14 3994 Jps
    15 3836 DataNode

     在这里注意到后来使用了root来进行安装,因为之前在设置ssh免密码登陆出现问题,后来经过检查后成功了,后来又创建了一个hadoop的账号echo。

    对刚安装好的hadoop集群做个测试:

     1 [jack@node1 ~]$ mkdir input
     2 [jack@node1 ~]$ cd input/
     3 [jack@node1 input]$ echo "hello world" > test1.txt
     4 [jack@node1 input]$ echo "hello hadoop" > test2.txt
     5 [jack@node1 input]$ cat test1.txt 
     6 hello world
     7 [jack@node1 input]$ cat test2.txt 
     8 hello hadoop
     9 [jack@node1 input]$ cd /hadoop-0.20.2/
    10 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -put /home/jack/input in
    11 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -ls in
    12 Found 2 items
    13 -rw-r--r--   1 echo supergroup         12 2013-05-06 15:23 /user/jack/in/test1.txt
    14 -rw-r--r--   1 echo supergroup         13 2013-05-06 15:23 /user/jack/in/test2.txt
    15 [jack@node1 hadoop-0.20.2]$ ls
    16 bin        CHANGES.txt  docs                    hadoop-0.20.2-examples.jar  ivy      librecordio  NOTICE.txt  webapps
    17 build.xml  conf         hadoop-0.20.2-ant.jar   hadoop-0.20.2-test.jar      ivy.xml  LICENSE.txt  README.txt
    18 c++        contrib      hadoop-0.20.2-core.jar  hadoop-0.20.2-tools.jar     lib      logs         src
    19 [jack@node1 hadoop-0.20.2]$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount in out
    20 13/05/06 15:24:01 INFO input.FileInputFormat: Total input paths to process : 2
    21 13/05/06 15:24:02 INFO mapred.JobClient: Running job: job_201305061516_0001
    22 13/05/06 15:24:03 INFO mapred.JobClient:  map 0% reduce 0%
    23 13/05/06 15:24:30 INFO mapred.JobClient:  map 50% reduce 0%
    24 13/05/06 15:24:46 INFO mapred.JobClient:  map 50% reduce 16%
    25 13/05/06 15:24:51 INFO mapred.JobClient:  map 100% reduce 16%
    26 13/05/06 15:25:02 INFO mapred.JobClient:  map 100% reduce 100%
    27 13/05/06 15:25:04 INFO mapred.JobClient: Job complete: job_201305061516_0001
    28 13/05/06 15:25:04 INFO mapred.JobClient: Counters: 17
    29 13/05/06 15:25:04 INFO mapred.JobClient:   Job Counters 
    30 13/05/06 15:25:04 INFO mapred.JobClient:     Launched reduce tasks=1
    31 13/05/06 15:25:04 INFO mapred.JobClient:     Launched map tasks=2
    32 13/05/06 15:25:04 INFO mapred.JobClient:     Data-local map tasks=2
    33 13/05/06 15:25:04 INFO mapred.JobClient:   FileSystemCounters
    34 13/05/06 15:25:04 INFO mapred.JobClient:     FILE_BYTES_READ=55
    35 13/05/06 15:25:04 INFO mapred.JobClient:     HDFS_BYTES_READ=25
    36 13/05/06 15:25:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=180
    37 13/05/06 15:25:04 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=25
    38 13/05/06 15:25:04 INFO mapred.JobClient:   Map-Reduce Framework
    39 13/05/06 15:25:04 INFO mapred.JobClient:     Reduce input groups=3
    40 13/05/06 15:25:04 INFO mapred.JobClient:     Combine output records=4
    41 13/05/06 15:25:04 INFO mapred.JobClient:     Map input records=2
    42 13/05/06 15:25:04 INFO mapred.JobClient:     Reduce shuffle bytes=61
    43 13/05/06 15:25:04 INFO mapred.JobClient:     Reduce output records=3
    44 13/05/06 15:25:04 INFO mapred.JobClient:     Spilled Records=8
    45 13/05/06 15:25:04 INFO mapred.JobClient:     Map output bytes=41
    46 13/05/06 15:25:04 INFO mapred.JobClient:     Combine input records=4
    47 13/05/06 15:25:04 INFO mapred.JobClient:     Map output records=4
    48 13/05/06 15:25:04 INFO mapred.JobClient:     Reduce input records=4
    49 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -ls
    50 Found 2 items
    51 drwxr-xr-x   - echo supergroup          0 2013-05-06 15:23 /user/jack/in
    52 drwxr-xr-x   - echo supergroup          0 2013-05-06 15:25 /user/jack/out
    53 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -ls ./out
    54 Found 2 items
    55 drwxr-xr-x   - echo supergroup          0 2013-05-06 15:24 /user/jack/out/_logs
    56 -rw-r--r--   1 echo supergroup         25 2013-05-06 15:24 /user/jack/out/part-r-00000
    57 [jack@node1 hadoop-0.20.2]$ bin/hadoop dfs -cat ./out/*
    58 hadoop    1
    59 hello    2
    60 world    1
    61 cat: Source must be a file.

     参考资料:http://hadoop.apache.org/docs/r0.19.1/cn/cluster_setup.html

  • 相关阅读:
    Lync 2013和Exchange 2013集成
    eclise配置tomcat出现服务Tomcat version 6.0 only supports J2EE 1.2, 1.3, 1.4 and Java EE 5 Web modules
    二叉树C语言
    NYOJ 298 相变点(矩阵高速功率)
    简化网站开发:SiteMesh小工具
    Android Ant 和 Gradle 包装工艺和效率控制
    [Webpack 2] Chunking common modules from multiple apps with the Webpack CommonsChunkPlugin
    [Webpack 2] Grouping vendor files with the Webpack CommonsChunkPlugin
    [Webpack 2] Hashing with Webpack for long term caching
    [Webpack 2] Maintain sane file sizes with webpack code splitting
  • 原文地址:https://www.cnblogs.com/Richardzhu/p/3061097.html
Copyright © 2011-2022 走看看