zoukankan      html  css  js  c++  java
  • Hadoop分布式集群部署

    Hadoop 2.x 部署
    * Local Mode
    * Distributed Mode
    * 伪分布式
    一台机器,运行所有的守护进程,
    从节点DataNode、NodeManager
    * 完全分布式
    有多个从节点
    DataNodes
    NodeManagers
    配置文件
    $HADOOP_HOME/etc/hadoop/slaves

    ================================================================
    三台机器
    192.168.217.131  192.168.217.132  192.168.217.133
    hadoop-senior    hadoop-senior02     hadoop-senior03
    1.5G           1G          1G
    1CPU          1CPU           1CPU

    配置映射
    /etc/hosts
    192.168.217.131 hadoop-senior.ibeifeng.com hadoop-senior
    192.168.217.132 hadoop-senior02.ibeifeng.com hadoop-senior02
    192.168.217.133 hadoop-senior03.ibeifeng.com hadoop-senior03

    =====================================================================
          hadoop-senior    hadoop-senior02     hadoop-senior03
    HDFS
          NameNode
          DataNode       DataNode           DataNode
                                     SecondaryNameNode
    YARN
                     ResourceManager
          NodeManager      NodeManager         NodeManager

    MapReduce
          JobHistoryServer

    配置
    * hdfs
    * hadoop-env.sh
    * core-site.xml
    * hdfs-site.xml
    * slaves
    * yarn
    * yarn-env.sh
    * yarn-site.xml
    * slaves
    * mapredue
    * mapred-env.sh
    * mapred-site.xml

    core-site.xml

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop-senior1.jason.com:8020</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/app/hadoop-2.5.0/data/tmp</value>
        </property>
        <property>
            <name>fs.trash.interval</name>
            <value>420</value>
        </property>
    </configuration>

    hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>hadoop-senior3.jason.com:50090</value>
        </property>
    </configuration>

    mapred-site.xml

    <configuration>
         <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>hadoop-senior1.jason.com:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>hadoop-senior1.jason.com:19888</value>
        </property>
    </configuration>

    yarn-site.xml

    <configuration>
    
    <!-- Site specific YARN configuration properties -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop-senior2.jason.com</value>
        </property>
    </configuration>

    ======================================================================
    集群搭建完成以后
    * 基本测试
    服务启动,是否可用,简单的应用
    * hdfs
    读写操作
    bin/hdfs dfs -mkdir -p /user/beifeng/tmp/conf
    bin/hdfs dfs -put etc/hadoop/*-site.xml /user/beifeng/tmp/conf
    bin/hdfs dfs -text /user/beifeng/tmp/conf/core-site.xml
    * yarn
    run jar
    * mapreduce
    bin/yarn jar share/hadoop/mapreduce/hadoop*example*.jar wordcount /user/beifeng/mapreuce/wordcount/input /user/beieng/mapreduce/wordcount/output
    * 基准测试
    测试集群的性能
    * hdfs
    写数据
    读数据
    * 监控集群
    Cloudera
    Cloudera Manager
    * 部署安装集群
    * 监控集群
    * 配置同步集群
    * 预警。。。。。

    =============================================================
    集群的时间要同步
    * 找一台机器
    时间服务器
    * 所有的机器与这台机器时间进行定时的同步
    比如,每日十分钟,同步一次时间

    # rpm -qa|grep ntp

    # vi /etc/ntp.conf

    # vi /etc/sysconfig/ntpd
    # Drop root to id 'ntp:ntp' by default.
    SYNC_HWCLOCK=yes
    OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g"

    [root@hadoop-senior hadoop-2.5.0]# service ntpd status
    ntpd is stopped
    [root@hadoop-senior hadoop-2.5.0]# service ntpd start
    Starting ntpd: [ OK ]
    [root@hadoop-senior hadoop-2.5.0]# chkconfig ntpd on

  • 相关阅读:
    查准率(precision)和查全率(recall)
    数据集大全:25个深度学习的开放数据集
    利用贝叶斯算法实现手写体识别(Python)
    KNN算法识别手写数字
    判断点在直线的左右哪一侧
    多节点bigchaindb集群部署
    java 多线程 3 synchronized 同步
    java 多线程 1 “常用的实现多线程的2种方式”:Thread 和 Runnable
    java 字符串
    java 关键字static
  • 原文地址:https://www.cnblogs.com/xdlaoliu/p/7304907.html
Copyright © 2011-2022 走看看