zoukankan      html  css  js  c++  java
  • hadoop-2.6.5集群搭建

    1.修改主机名:vi /etc/sysconfig/network

    NETWORKING=yes
    HOSTNAME=node1

    2.修改域名映射:vi /etc/hosts

    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 //有没有都可以
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 //有没有都可以
    192.168.10.11 node1
    192.168.10.12 node2
    192.168.10.13 node3
    192.168.10.14 node4

    3.设置date同步:

    1)yum install ntp //如果服务器中没有则安装
      1.1)chkconfig ntpd on //设置开机自启
    2) ntpdate ntp.api.bz //时间服务器
    3)service ntpd start/stop/restart/reload
    4)设置定时同步:crontab -e
      */10 * * * * ntpdate time.nist.gov //每隔10分钟同步一次
      4.1)我们可以通过 chkconfig --list | grep cron 命令来查看cron服务的启动情况
        crond 0:关闭 1:关闭 2:启用 3:启用 4:启用 5:启用 6:关闭
        系统启动级别如果是1-4,cron服务都会开机自动启动的
      4.2)设置crond开机自启:chkconfig crond on
      4.3) crontab使用参数
        -e [UserName]: 执行文字编辑器来设定时程表,内定的文字编辑器是 vi
        -r [UserName]: 删除目前的时程表
        -l [UserName]: 列出目前的时程表
        -v [UserName]:列出用户cron作业的状态

    4.关闭防火墙:chkconfig iptables off

    5.关闭安全机制:vi /etc/selinux/config

    SELINUX=disabled
    SELINUXTYPE=targeted

    6.ssh免密登录

    1)yum list | grep ssh
    2)yum install -y openssh-server openssh-clients
    3)service sshd start
    4)chkconfig sshd on
    5)ssh-keygen // 生成秘钥
    6) ssh-copy-id node1 // 免密登录 当前服务器可以免密登录node1
    设置namenode和resourcemanager服务器免密登录所有服务器(namenode + datanode)

    7.Hadoop完全分布式集群搭建:

    1)配置文件
      1.1 vi + /etc/profile
        #JAVA_HOME
        export JAVA_HOME=/opt/module/jdk1.8.0_171
        #HADOOP_HOME
        export HADOOP_HOME=/opt/module/hadoop-2.6.5
        export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
      1.2 hadoop-env.sh mapred-env.sh yarn-env.sh
        export JAVA_HOME=/opt/module/jdk1.8.0_171
      1.3 hdfs-core.xml
        <property>
          <name>fs.defaultFS</name>
          <value>hdfs://node1:8020</value>
        </property>
        <property>
          <name>hadoop.tmp.dir</name>
          <value>/opt/data/hadoop</value>
        </property>
      1.4 hdfs-site.xml
        <property>
          <name>dfs.replication</name>
          <value>2</value>
        </property>
        <property>
          <name>dfs.namenode.secondary.http-address</name>
          <value>node2:50090</value>
        </property>
      1.5 slaves
        node2
        node3
        node4
      1.6 格式化文件系统:./bin/hdfs namenode -format
        查看帮助:./bin/hdfs namenode -h
      1.7 启动集群:./sbin/start-dfs.sh
      1.8 查看web UI: IP:50070:
        node1:50070
      1.9 帮助:
        hdfs
        hdfs dfs

        创建目录:hdfs dfs -mkdir -p /user/root
        查看目录: hdfs dfs -ls /
        上传文件: hdfs dfs -put hadoop-2.6.5.tar.gz /user/root
      1.10 停止集群:./sbin/stop-dfs.sh

    8.Hadoop-HA搭建

      1)配置文件
        1.1 vi + /etc/profile
          #JAVA_HOME
          export JAVA_HOME=/opt/module/jdk1.8.0_171
          #HADOOP_HOME
          export HADOOP_HOME=/opt/module/hadoop-2.6.5
          #ZOOKEEPER_HOME
          export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.6
          export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin
        1.2 hadoop-env.sh mapred-env.sh yarn-env.sh
          export JAVA_HOME=/opt/module/jdk1.8.0_171
        1.3 hdfs-core.xml
          <property>
            <name>fs.defaultFS</name>
            <value>hdfs://mycluster</value>
          </property>
          <property>
            <name>hadoop.tmp.dir</name>
          <value>/opt/data/hadoop</value>
          </property>
          <property>
            <name>ha.zookeeper.quorum</name>
            <value>node2:2181,node3:2181,node4:2181</value>
          </property>
        1.4 hdfs-site.xml
          <property>
            <name>dfs.replication</name>
            <value>2</value>
          </property>
          <property>
            <name>dfs.nameservices</name>
            <value>mycluster</value>
          </property>
          <property>
            <name>dfs.ha.namenodes.mycluster</name>
            <value>nn1,nn2</value>
          </property>
          <property>
            <name>dfs.namenode.rpc-address.mycluster.nn1</name>
            <value>node1:8020</value>
          </property>
          <property>
            <name>dfs.namenode.rpc-address.mycluster.nn2</name>
            <value>node2:8020</value>
          </property>
          <property>
            <name>dfs.namenode.http-address.mycluster.nn1</name>
            <value>node1:50070</value>
          </property>
          <property>
            <name>dfs.namenode.http-address.mycluster.nn2</name>
            <value>node2:50070</value>
          </property>
          <property>
            <name>dfs.namenode.shared.edits.dir</name>
            <value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
          </property>
          <property>
            <name>dfs.client.failover.proxy.provider.mycluster</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
          </property>
          <property>
            <name>dfs.ha.fencing.methods</name>
            <value>sshfence</value>
          </property>
          <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <!-- 如果文件是id_dsa这后边需要改成id_dsa -->
            <value>/root/.ssh/id_rsa</value>
          </property>
          <property>
            <name>dfs.journalnode.edits.dir</name>
            <value>/opt/data/hadoop/journal</value>
          </property>
          <property>
            <name>dfs.ha.automatic-failover.enabled</name>
            <value>true</value>
          </property>
        1.5 slaves
          node2
          node3
          node4
        1.6 zookeeper集群搭建
          zoo.cfg
          tickTime=2000
          dataDir=/opt/data/zookeeper
          clientPort=2181
          initLimit=5
          syncLimit=2
          server.1=node2:2888:3888
          server.2=node3:2888:3888
          server.3=node4:2888:3888
          /opt/data/zookeeper/myid 内容分别是[1,2,3]
        1.7 每个zk节点上都执行:zkServer.sh start
          看是否启动成功:zkServer.sh status
        1.8 每个journalnode节点都执行:hadoop-daemon.sh start journalnode //必须在启动Hadoop集群之前先启动journalnode
        1.9 同步编辑日志
          如果已有集群并且是单namenode
            hdfs namenode -initializeSharedEdits(在已经format的namenode上执行)
            hadoop-daemon.sh start namenode
            hdfs namenode -bootstrapStandby(没有format的namenode上执行)
          如果是新建集群
            hdfs namenode -format
            hadoop-daemon.sh start namenode
            hdfs namenode -bootstrapStandby(没有format的namenode上执行)
        1.10 格式化zookeeper并启动
          hdfs zkfc -formatZK(在其中一台namenode节点上格式化即可)
          hadoop-daemon.sh start zkfc(两台zkfc(也就是namenode)节点都启动)或者直接全部启动start-dfs.sh

    9.yarn搭建

    1)配置文件
      mapred-site.xml
        <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
      yarn-site.xml
        <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
        </property>
        <property>
          <name>yarn.resourcemanager.ha.enabled</name>
          <value>true</value>
        </property>
        <property>
          <name>yarn.resourcemanager.cluster-id</name>
          <value>cluster1</value>
        </property>
        <property>
          <name>yarn.resourcemanager.ha.rm-ids</name>
          <value>rm1,rm2</value>
        </property>
        <property>
          <name>yarn.resourcemanager.hostname.rm1</name>
          <value>node3</value>
        </property>
        <property>
          <name>yarn.resourcemanager.hostname.rm2</name>
          <value>node4</value>
        </property>
        <property>
          <name>yarn.resourcemanager.zk-address</name>
          <value>node2:2181,node3:2181,node4:2181</value>
        </property>
    2)启动
      start-yarn.sh (这个只启动nodemanager)
      yarn-daemon.sh start resourcemanager (在两台resourcemanager节点上都启动)

    3)测试wordcount
      hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /user/jqbai/test.txt /user/jqbai/wordcount

    10.搭建 windows开发环境

    添加环境变量:
      1)HADOOP_USER_NAME=root
      2)HADOOP_HOME=D:softwarehadoop-2.6.5(这是Windows下专用的)

  • 相关阅读:
    VirtualBox不显示64bit版本的iso
    学习和参考资料
    神经网络和机器学习资料整理
    动态空间释放时的错误操作引起的运行时错误
    WIN7 X64的运行命令窗口
    vs2010中的ADO控件及绑定控件
    AdventureWorks2012.mdf的使用
    VS2008/2010 都不能使用Access2010数据库
    WIN7 64位操作系统 无法找到Access驱动
    如何在VS2010的VC++ 基于对话框的MFC程序中添加菜单
  • 原文地址:https://www.cnblogs.com/jqbai/p/10989925.html
Copyright © 2011-2022 走看看