zoukankan      html  css  js  c++  java
  • HDFS的搭建

    所有的节点都必须做的:(NameNode,DataNode)

    1 需要知道hadoop依赖Java和SSH

    1. Java 1.5.x (以上),必须安装。安装目录为/usr/java/jdk1.7.0

    1 下载合适的jdk 

    //此文件为64Linux 系统使用的 RPM包 

     http://download.oracle.com/otn-pub/java/jdk/7/jdk-7-linux-x64.rpm 

     

    2 安装jdk 

    rpm -ivh jdk-7-linux-x64.rpm 

     

    3 验证java 

    [root@hadoop1 ~]# java -version 

    java version "1.7.0" 

    Java(TM) SE Runtime Environment (build 1.7.0-b147) 

    Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) 

    [root@hadoop1 ~]# ls /usr/java/ 

    default  jdk1.7.0  latest 

     

    4 配置java环境变量 

    #vim /etc/profile //在profile文件中加入如下信息: 

     

    #add for hadoop 

    export JAVA_HOME=/usr/java/jdk1.7.0 

    export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/ 

    export PATH=$PATH:$JAVA_HOME/bin 

     

    //使环境变量生效 

    source /etc/profile 

     

    5 拷贝 /etc/profile 到 datanode 

    1. ssh 必须安装并且保证 sshd 一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。
    2. 检验是否安装了SSH,执行命令:

                                   which ssh

               which sshd

               which ssh-keygen

              如果上面三个命令的返回都不是空,则证明SSH已经安装好。

    2 建立 Hadoop 公共帐号

    1. 所有的节点应该具有相同的用户名,可以使用如下命令添加:
    2. useradd hadoop
    3. passwd hadoop

     3配置 host 主机名

      tail -n 3 /etc/hosts

        192.168.57.75  namenode

        192.168.57.76  datanode1

        192.168.57.78  datanode2

        192.168.57.79  datanode3

    在NameNode节点中需要进行的:

    1.生成ssh密钥对。

    在NameNode上执行下面的命令,生成RSA密钥对:

    执行命令:

      ssh-keygen -t rsa

    下面是从别的地方摘抄过来的:

    大家可以配置成密论认证的方式

    首先生成密钥,用命令ssh-keygen –t rsa

         运行后可以一直空格,生成密钥,id_rsa和id_rsa.pub文件 ,默认放在/root/.ssh/下,.ssh文件是隐藏的,要显示隐藏文件才看得到

         在/home/admin下创建.ssh活页夹,把id_rsa.pub文件copy 到/home/admin/.ssh活页夹下,改变文件名为authorized_keys

         

         把id_rsa 文件copy 到一个目录如/home/id_rsa

         用下面的命令测试配好了没:

         ssh  -i  /home/id_rsa admin@localhost

         应该不用密码就直接进去了~

    2.查看生成的公钥:

    more /home/root/.ssh/id_rsa.pub

    3.将公钥复制到各个从节点上。

           1.在主节点上运行命令:scp /home/root/.ssh/id_rsa.pub  hadoop_dataNode@ip地址: ~/master_key   将生成的公钥文件从NameNode上面复制到DataNode中的                 "~/master_key"文件。

            2.在从节点DataNode之上设置该文件为授权密钥:

        mkdir ~/.ssh

               chmod 700 ~/.ssh

        mv ~/master_key ~/.ssh/authrized_keys

               chmod 600 ~/.ssh/authrized_keys

    4.从主节点上访问从节点: ssh ip地址

    hadoop配置(这个需要在所有的节点上配置,除了一些特殊的命令,有标注)

     hadoop 配置
    //注意使用hadoop 用户 操作
    1 配置目录
    [hadoop@hadoop1 ~]$ pwd
    /home/hadoop
    [hadoop@hadoop1 ~]$ ll
    total 59220
    lrwxrwxrwx  1 hadoop hadoop       17 Feb  1 16:59 hadoop -> hadoop-0.20.203.0
    drwxr-xr-x 12 hadoop hadoop     4096 Feb  1 17:31 hadoop-0.20.203.0
    -rw-r--r--  1 hadoop hadoop 60569605 Feb  1 14:24 hadoop-0.20.203.0rc1.tar.gz
     
     
    2 配置hadoop-env.sh,指定java位置
    vim hadoop/conf/hadoop-env.sh
    export JAVA_HOME=/usr/java/jdk1.7.0
     
    3 配置core-site.xml //定位文件系统的 namenode
     
    [hadoop@hadoop1 ~]$ cat hadoop/conf/core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
     
    <property>
    <name>fs.default.name</name>
    <value>hdfs://namenode的ip地址:9000</value>
    </property>
     
    </configuration>

    hadoop.tmp.dir是hadoop文件系统依赖的基础配置,很多路径都依赖它。它默认的位置是在/tmp/{$user}下面,但是在/tmp路径下的存储是不安全的,因为linux一次重启,文件就可能被删除。
    编辑conf/core-site.xml,在里面加上如下属性: 

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/had/hadoop/data</value>
       <description>A base for other temporary directories.</description>
    </property>


    4 配置mapred-site.xml //定位jobtracker 所在的主节点 (其实这个是map-reduce的缩写)
     
    [hadoop@hadoop1 ~]$ cat hadoop/conf/mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
     
    <property>
    <name>mapred.job.tracker</name>
    <value>namenode:9001</value>
    </property>
     
    </configuration>
     
    5 配置hdfs-site.xml //配置HDFS副本数量
     
    [hadoop@hadoop1 ~]$ cat hadoop/conf/hdfs-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
     
    <property>
    <name>dfs.replication</name>
    <value>3</value>
    </property>
     
    </configuration>
     
    6 配置 master 与 slave 配置文档
    [hadoop@hadoop1 ~]$ cat hadoop/conf/masters
    namenode
    [hadoop@hadoop1 ~]$ cat hadoop/conf/slaves
    datanode1
    datanode2
     
    7 拷贝hadoop 目录到所有节点(datanode)
    [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode1:/home/hadoop/
    [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode2:/home/hadoop/
    [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode3:/home/hadoop
     
    8 格式化 HDFS (这个只需要在NameNode上进行设置,并且最好设置一下)
    [hadoop@hadoop1 hadoop]$ bin/hadoop namenode -format
    12/02/02 11:31:15 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = hadoop1.test.com/127.0.0.1
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 0.20.203.0
    STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
    ************************************************************/
    Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N)  Y  //这里输入Y
    12/02/02 11:31:17 INFO util.GSet: VM type       = 64-bit
    12/02/02 11:31:17 INFO util.GSet: 2% max memory = 19.33375 MB
    12/02/02 11:31:17 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    12/02/02 11:31:17 INFO util.GSet: recommended=2097152, actual=2097152
    12/02/02 11:31:17 INFO namenode.FSNamesystem: fsOwner=hadoop
    12/02/02 11:31:18 INFO namenode.FSNamesystem: supergroupsupergroup=supergroup
    12/02/02 11:31:18 INFO namenode.FSNamesystem: isPermissionEnabled=true
    12/02/02 11:31:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    12/02/02 11:31:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    12/02/02 11:31:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
    12/02/02 11:31:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.
    12/02/02 11:31:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
    12/02/02 11:31:18 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop1.test.com/127.0.0.1
    ************************************************************/
    [hadoop@hadoop1 hadoop]$
     
    9 启动hadoop 守护进程
    [hadoop@hadoop1 hadoop]$ bin/start-all.sh
    starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.test.com.out
    datanode1: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop2.test.com.out
    datanode2: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop3.test.com.out
    datanode3: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop4.test.com.out
    starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.test.com.out
    datanode1: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop2.test.com.out
    datanode2: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop3.test.com.out
    datanode3: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop4.test.com.out
     
    10 验证
    //namenode
    [hadoop@hadoop1 logs]$ jps
    2883 JobTracker
    3002 Jps
    2769 NameNode
     
    //datanode
    [hadoop@hadoop2 ~]$ jps
    2743 TaskTracker
    2670 DataNode
    2857 Jps
     
    [hadoop@hadoop3 ~]$ jps
    2742 TaskTracker
    2856 Jps
    2669 DataNode
     
    [hadoop@hadoop4 ~]$ jps
    2742 TaskTracker
    2852 Jps
    2659 DataNode
     
    Hadoop 监控web页面
    http://NameNode的ip地址:50070/dfshealth.jsp

  • 相关阅读:
    leetcode 33. Search in Rotated Sorted Array
    leetcode 32. Longest Valid Parentheses
    leetcode 28. Implement strStr()
    leetcode 27. Remove Element
    leetcode 26. Remove Duplicates from Sorted Array
    leetcode 24. Swap Nodes in Pairs
    leetcode 22. Generate Parentheses
    树莓派的频率管理和热控制
    sql执行insert插入一条记录同时获取刚插入的id
    全程直播个人博客重构过程,采用springboot+dubbo+jpa技术栈。
  • 原文地址:https://www.cnblogs.com/lxzh/p/3008319.html
Copyright © 2011-2022 走看看