zoukankan      html  css  js  c++  java
  • HDFS的搭建

    所有的节点都必须做的:(NameNode,DataNode)

    1 需要知道hadoop依赖Java和SSH

    1. Java 1.5.x (以上),必须安装。安装目录为/usr/java/jdk1.7.0

    1 下载合适的jdk 

    //此文件为64Linux 系统使用的 RPM包 

     http://download.oracle.com/otn-pub/java/jdk/7/jdk-7-linux-x64.rpm 

     

    2 安装jdk 

    rpm -ivh jdk-7-linux-x64.rpm 

     

    3 验证java 

    [root@hadoop1 ~]# java -version 

    java version "1.7.0" 

    Java(TM) SE Runtime Environment (build 1.7.0-b147) 

    Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) 

    [root@hadoop1 ~]# ls /usr/java/ 

    default  jdk1.7.0  latest 

     

    4 配置java环境变量 

    #vim /etc/profile //在profile文件中加入如下信息: 

     

    #add for hadoop 

    export JAVA_HOME=/usr/java/jdk1.7.0 

    export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/ 

    export PATH=$PATH:$JAVA_HOME/bin 

     

    //使环境变量生效 

    source /etc/profile 

     

    5 拷贝 /etc/profile 到 datanode 

    1. ssh 必须安装并且保证 sshd 一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。
    2. 检验是否安装了SSH,执行命令:

                                   which ssh

               which sshd

               which ssh-keygen

              如果上面三个命令的返回都不是空,则证明SSH已经安装好。

    2 建立 Hadoop 公共帐号

    1. 所有的节点应该具有相同的用户名,可以使用如下命令添加:
    2. useradd hadoop
    3. passwd hadoop

     3配置 host 主机名

      tail -n 3 /etc/hosts

        192.168.57.75  namenode

        192.168.57.76  datanode1

        192.168.57.78  datanode2

        192.168.57.79  datanode3

    在NameNode节点中需要进行的:

    1.生成ssh密钥对。

    在NameNode上执行下面的命令,生成RSA密钥对:

    执行命令:

      ssh-keygen -t rsa

    下面是从别的地方摘抄过来的:

    大家可以配置成密论认证的方式

    首先生成密钥,用命令ssh-keygen –t rsa

         运行后可以一直空格,生成密钥,id_rsa和id_rsa.pub文件 ,默认放在/root/.ssh/下,.ssh文件是隐藏的,要显示隐藏文件才看得到

         在/home/admin下创建.ssh活页夹,把id_rsa.pub文件copy 到/home/admin/.ssh活页夹下,改变文件名为authorized_keys

         

         把id_rsa 文件copy 到一个目录如/home/id_rsa

         用下面的命令测试配好了没:

         ssh  -i  /home/id_rsa admin@localhost

         应该不用密码就直接进去了~

    2.查看生成的公钥:

    more /home/root/.ssh/id_rsa.pub

    3.将公钥复制到各个从节点上。

           1.在主节点上运行命令:scp /home/root/.ssh/id_rsa.pub  hadoop_dataNode@ip地址: ~/master_key   将生成的公钥文件从NameNode上面复制到DataNode中的                 "~/master_key"文件。

            2.在从节点DataNode之上设置该文件为授权密钥:

        mkdir ~/.ssh

               chmod 700 ~/.ssh

        mv ~/master_key ~/.ssh/authrized_keys

               chmod 600 ~/.ssh/authrized_keys

    4.从主节点上访问从节点: ssh ip地址

    hadoop配置(这个需要在所有的节点上配置,除了一些特殊的命令,有标注)

     hadoop 配置
    //注意使用hadoop 用户 操作
    1 配置目录
    [hadoop@hadoop1 ~]$ pwd
    /home/hadoop
    [hadoop@hadoop1 ~]$ ll
    total 59220
    lrwxrwxrwx  1 hadoop hadoop       17 Feb  1 16:59 hadoop -> hadoop-0.20.203.0
    drwxr-xr-x 12 hadoop hadoop     4096 Feb  1 17:31 hadoop-0.20.203.0
    -rw-r--r--  1 hadoop hadoop 60569605 Feb  1 14:24 hadoop-0.20.203.0rc1.tar.gz
     
     
    2 配置hadoop-env.sh,指定java位置
    vim hadoop/conf/hadoop-env.sh
    export JAVA_HOME=/usr/java/jdk1.7.0
     
    3 配置core-site.xml //定位文件系统的 namenode
     
    [hadoop@hadoop1 ~]$ cat hadoop/conf/core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
     
    <property>
    <name>fs.default.name</name>
    <value>hdfs://namenode的ip地址:9000</value>
    </property>
     
    </configuration>

    hadoop.tmp.dir是hadoop文件系统依赖的基础配置,很多路径都依赖它。它默认的位置是在/tmp/{$user}下面,但是在/tmp路径下的存储是不安全的,因为linux一次重启,文件就可能被删除。
    编辑conf/core-site.xml,在里面加上如下属性: 

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/had/hadoop/data</value>
       <description>A base for other temporary directories.</description>
    </property>


    4 配置mapred-site.xml //定位jobtracker 所在的主节点 (其实这个是map-reduce的缩写)
     
    [hadoop@hadoop1 ~]$ cat hadoop/conf/mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
     
    <property>
    <name>mapred.job.tracker</name>
    <value>namenode:9001</value>
    </property>
     
    </configuration>
     
    5 配置hdfs-site.xml //配置HDFS副本数量
     
    [hadoop@hadoop1 ~]$ cat hadoop/conf/hdfs-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
     
    <property>
    <name>dfs.replication</name>
    <value>3</value>
    </property>
     
    </configuration>
     
    6 配置 master 与 slave 配置文档
    [hadoop@hadoop1 ~]$ cat hadoop/conf/masters
    namenode
    [hadoop@hadoop1 ~]$ cat hadoop/conf/slaves
    datanode1
    datanode2
     
    7 拷贝hadoop 目录到所有节点(datanode)
    [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode1:/home/hadoop/
    [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode2:/home/hadoop/
    [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode3:/home/hadoop
     
    8 格式化 HDFS (这个只需要在NameNode上进行设置,并且最好设置一下)
    [hadoop@hadoop1 hadoop]$ bin/hadoop namenode -format
    12/02/02 11:31:15 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = hadoop1.test.com/127.0.0.1
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 0.20.203.0
    STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
    ************************************************************/
    Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N)  Y  //这里输入Y
    12/02/02 11:31:17 INFO util.GSet: VM type       = 64-bit
    12/02/02 11:31:17 INFO util.GSet: 2% max memory = 19.33375 MB
    12/02/02 11:31:17 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    12/02/02 11:31:17 INFO util.GSet: recommended=2097152, actual=2097152
    12/02/02 11:31:17 INFO namenode.FSNamesystem: fsOwner=hadoop
    12/02/02 11:31:18 INFO namenode.FSNamesystem: supergroupsupergroup=supergroup
    12/02/02 11:31:18 INFO namenode.FSNamesystem: isPermissionEnabled=true
    12/02/02 11:31:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    12/02/02 11:31:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    12/02/02 11:31:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
    12/02/02 11:31:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.
    12/02/02 11:31:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
    12/02/02 11:31:18 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop1.test.com/127.0.0.1
    ************************************************************/
    [hadoop@hadoop1 hadoop]$
     
    9 启动hadoop 守护进程
    [hadoop@hadoop1 hadoop]$ bin/start-all.sh
    starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.test.com.out
    datanode1: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop2.test.com.out
    datanode2: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop3.test.com.out
    datanode3: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop4.test.com.out
    starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.test.com.out
    datanode1: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop2.test.com.out
    datanode2: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop3.test.com.out
    datanode3: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop4.test.com.out
     
    10 验证
    //namenode
    [hadoop@hadoop1 logs]$ jps
    2883 JobTracker
    3002 Jps
    2769 NameNode
     
    //datanode
    [hadoop@hadoop2 ~]$ jps
    2743 TaskTracker
    2670 DataNode
    2857 Jps
     
    [hadoop@hadoop3 ~]$ jps
    2742 TaskTracker
    2856 Jps
    2669 DataNode
     
    [hadoop@hadoop4 ~]$ jps
    2742 TaskTracker
    2852 Jps
    2659 DataNode
     
    Hadoop 监控web页面
    http://NameNode的ip地址:50070/dfshealth.jsp

  • 相关阅读:
    HDU-4248 A Famous Stone Collector 组合数学 DP
    HDU
    暑期训练1 Gym
    暑期训练1 Gym-102623L Lottery Tickets 模拟 贪心构造
    暑期训练2 Gym
    poj-1011 sticks(搜索题)
    hdu-2553 N皇后问题(搜索题)
    poj-2236 wireless network(并查集)
    poj-1700 crossing river(贪心题)
    poj-3278 catch that cow(搜索题)
  • 原文地址:https://www.cnblogs.com/lxzh/p/3008319.html
Copyright © 2011-2022 走看看