zoukankan      html  css  js  c++  java
  • Docker中Hadoop集群搭建

    使用腾讯云主机,docker构建集群测试环境。

    环境

    1、操作系统: CentOS 7.2 64位

    网路设置

    hostname IP
    cluster-master 172.18.0.2
    cluster-slave1 172.18.0.3
    cluster-slave2 172.18.0.4
    cluster-slave3 172.18.0.5

    Docker 安装

    curl -sSL https://get.daocloud.io/docker | sh
    
    ##换源
    ###这里可以参考这篇文章http://www.jianshu.com/p/34d3b4568059
    curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://67e93489.m.daocloud.io
    
    ##开启自启动
    systemctl enable docker
    systemctl start docker
    

    拉去Centos镜像

    docker pull daocloud.io/library/centos:latest
    

    使用docker ps 查看下载的镜像

    创建容器

    按照集群的架构,创建容器时需要设置固定IP,所以先要在docker使用如下命令创建固定IP的子网

    docker network create --subnet=172.18.0.0/16 netgroup
    

    docker的子网创建完成之后就可以创建固定IP的容器了

    #cluster-master
    #-p 设置docker映射到容器的端口 后续查看web管理页面使用
    docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-master -h cluster-master -p 18088:18088 -p 9870:9870 --net netgroup --ip 172.18.0.2 daocloud.io/library/centos /usr/sbin/init
    
    #cluster-slaves
    docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave1 -h cluster-slave1 --net netgroup --ip 172.18.0.3 daocloud.io/library/centos /usr/sbin/init
    
    docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave2 -h cluster-slave2 --net netgroup --ip 172.18.0.4 daocloud.io/library/centos /usr/sbin/init
    
    docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave3 -h cluster-slave3 --net netgroup --ip 172.18.0.5 daocloud.io/library/centos /usr/sbin/init
    

    启动控制台并进入docker容器中:

    docker exec -it cluster-master /bin/bash
    

    安装OpenSSH免密登录

    1、cluster-master安装:

    #cluster-master需要修改配置文件(特殊)
    #cluster-master
    
    #安装openssh
    [root@cluster-master /]# yum -y install openssh openssh-server openssh-clients
    
    [root@cluster-master /]# systemctl start sshd
    ####ssh自动接受新的公钥
    ####master设置ssh登录自动添加kown_hosts
    [root@cluster-master /]# vi /etc/ssh/ssh_config
    #将原来的StrictHostKeyChecking ask
    #设置StrictHostKeyChecking为no
    #保存
    [root@cluster-master /]# systemctl restart sshd
    

    2、分别对slaves安装OpenSSH

    #安装openssh
    [root@cluster-slave1 /]#yum -y install openssh openssh-server openssh-clients
    
    [root@cluster-slave1 /]# systemctl start sshd
    

    3、cluster-master公钥分发

    在master机上执行
    ssh-keygen -t rsa
    并一路回车,完成之后会生成~/.ssh目录,目录下有id_rsa(私钥文件)和id_rsa.pub(公钥文件),再将id_rsa.pub重定向到文件authorized_keys

    ssh-keygen -t rsa
    #一路回车
    
    [root@cluster-master /]# cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
    

    文件生成之后用scp将公钥文件分发到集群slave主机

    [root@cluster-master /]# ssh root@cluster-slave1 'mkdir ~/.ssh'
    [root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave1:~/.ssh
    [root@cluster-master /]# ssh root@cluster-slave2 'mkdir ~/.ssh'
    [root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave2:~/.ssh
    [root@cluster-master /]# ssh root@cluster-slave3 'mkdir ~/.ssh'
    [root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave3:~/.ssh
    

    分发完成之后测试(ssh root@cluster-slave1)是否已经可以免输入密码登录

    Ansible安装

    [root@cluster-master /]# yum -y install epel-release
    [root@cluster-master /]# yum -y install ansible
    #这样的话ansible会被安装到/etc/ansible目录下
    

    此时我们再去编辑ansible的hosts文件

    vi /etc/ansible/hosts
    
    [cluster]
    cluster-master
    cluster-slave1
    cluster-slave2
    cluster-slave3
    
    [master]
    cluster-master
    
    [slaves]
    cluster-slave1
    cluster-slave2
    cluster-slave3
    
    

    配置docker容器hosts

    由于/etc/hosts文件在容器启动时被重写,直接修改内容在容器重启后不能保留,为了让容器在重启之后获取集群hosts,使用了一种启动容器后重写hosts的方法。
    需要在~/.bashrc中追加以下指令

    :>/etc/hosts
    cat >>/etc/hosts<<EOF
    127.0.0.1   localhost
    172.18.0.2  cluster-master
    172.18.0.3  cluster-slave1
    172.18.0.4  cluster-slave2
    172.18.0.5  cluster-slave3
    EOF
    
    source ~/.bashrc
    

    使配置文件生效,可以看到/etc/hosts文件已经被改为需要的内容

    [root@cluster-master ansible]# cat /etc/hosts
    127.0.0.1   localhost
    172.18.0.2  cluster-master
    172.18.0.3  cluster-slave1
    172.18.0.4  cluster-slave2
    172.18.0.5  cluster-slave3
    

    用ansible分发.bashrc至集群slave下

    ansible cluster -m copy -a "src=~/.bashrc dest=~/"
    

    软件环境配置

    下载JDK1.8并解压缩至/opt 目录下

    下载hadoop3 到/opt目录下,解压安装包,并创建链接文件

    tar -xzvf hadoop-3.2.0.tar.gz
    ln -s hadoop-3.2.0 hadoop
    

    配置java和hadoop环境变量

    编辑 ~/.bashrc文件

    # hadoop
    export HADOOP_HOME=/opt/hadoop-3.2.0
    export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
    
    #java
    export JAVA_HOME=/opt/jdk8
    export PATH=$HADOOP_HOME/bin:$PATH
    

    使文件生效:

    source .bashrc
    

    配置hadoop运行所需配置文件

    cd $HADOOP_HOME/etc/hadoop/
    

    1、修改core-site.xml

    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/hadoop/tmp</value>
            <description>A base for other temporary directories.</description>
        </property>
        <!-- file system properties -->
        <property>
            <name>fs.default.name</name>
            <value>hdfs://cluster-master:9000</value>
        </property>
        <property>
        <name>fs.trash.interval</name>
            <value>4320</value>
        </property>
    </configuration>
    

    2、修改hdfs-site.xml

    <configuration>
    <property>
       <name>dfs.namenode.name.dir</name>
       <value>/home/hadoop/tmp/dfs/name</value>
     </property>
     <property>
       <name>dfs.datanode.data.dir</name>
       <value>/home/hadoop/data</value>
     </property>
     <property>
       <name>dfs.replication</name>
       <value>3</value>
     </property>
     <property>
       <name>dfs.webhdfs.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>dfs.permissions.superusergroup</name>
       <value>staff</value>
     </property>
     <property>
       <name>dfs.permissions.enabled</name>
       <value>false</value>
     </property>
     </configuration>
    

    3、修改mapred-site.xml

    <configuration>
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
    <property>
        <name>mapred.job.tracker</name>
        <value>cluster-master:9001</value>
    </property>
    <property>
      <name>mapreduce.jobtracker.http.address</name>
      <value>cluster-master:50030</value>
    </property>
    <property>
      <name>mapreduce.jobhisotry.address</name>
      <value>cluster-master:10020</value>
    </property>
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>cluster-master:19888</value>
    </property>
    <property>
      <name>mapreduce.jobhistory.done-dir</name>
      <value>/jobhistory/done</value>
    </property>
    <property>
      <name>mapreduce.intermediate-done-dir</name>
      <value>/jobhisotry/done_intermediate</value>
    </property>
    <property>
      <name>mapreduce.job.ubertask.enable</name>
      <value>true</value>
    </property>
    </configuration>
    

    4、yarn-site.xml

    <configuration>
        <property>
       <name>yarn.resourcemanager.hostname</name>
       <value>cluster-master</value>
     </property>
     <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
     </property>
     <property>
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
     </property>
     <property>
       <name>yarn.resourcemanager.address</name>
       <value>cluster-master:18040</value>
     </property>
    <property>
       <name>yarn.resourcemanager.scheduler.address</name>
       <value>cluster-master:18030</value>
     </property>
     <property>
       <name>yarn.resourcemanager.resource-tracker.address</name>
       <value>cluster-master:18025</value>
     </property> <property>
       <name>yarn.resourcemanager.admin.address</name>
       <value>cluster-master:18141</value>
     </property>
    <property>
       <name>yarn.resourcemanager.webapp.address</name>
       <value>cluster-master:18088</value>
     </property>
    <property>
       <name>yarn.log-aggregation-enable</name>
       <value>true</value>
     </property>
    <property>
       <name>yarn.log-aggregation.retain-seconds</name>
       <value>86400</value>
     </property>
    <property>
       <name>yarn.log-aggregation.retain-check-interval-seconds</name>
       <value>86400</value>
     </property>
    <property>
       <name>yarn.nodemanager.remote-app-log-dir</name>
       <value>/tmp/logs</value>
     </property>
    <property>
       <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
       <value>logs</value>
     </property>
    </configuration>
    

    打包hadoop 向slaves分发

    tar -cvf hadoop-dis.tar hadoop hadoop-3.2.0
    

    使用ansible-playbook分发.bashrc和hadoop-dis.tar至slave主机

    ---
    - hosts: cluster
      tasks:
        - name: copy .bashrc to slaves
          copy: src=~/.bashrc dest=~/
          notify:
            - exec source
        - name: copy hadoop-dis.tar to slaves
          unarchive: src=/opt/hadoop-dis.tar dest=/opt
    
      handlers:
        - name: exec source
          shell: source ~/.bashrc
    

    将以上yaml保存为hadoop-dis.yaml,并执行

    ansible-playbook hadoop-dis.yaml
    

    hadoop-dis.tar会自动解压到slave主机的/opt目录下

    Hadoop 启动

    格式化namenode

    hadoop namenode -format
    

    如果看到storage format success等字样,即可格式化成功

    启动集群

    cd $HADOOP_HOME/sbin
    start-all.sh
    

    启动后可使用jps命令查看是否启动成功

    注意:
    在实践中遇到节点slaves 上的datanode服务没有启动,查看slave上目录结构发现
    没有生成配置文件中设置的文件夹,比如:core-site.xml中

    <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/hadoop/tmp</value>
            <description>A base for other temporary directories.</description>
        </property>
    

    hdfs-site.xml文件中:

    <property>
       <name>dfs.namenode.name.dir</name>
       <value>/home/hadoop/tmp/dfs/name</value>
     </property>
     <property>
       <name>dfs.datanode.data.dir</name>
       <value>/home/hadoop/data</value>
     </property>
    

    手动到节点中生成这些文件夹,之后删除master中这些文件夹和$HADOOP_HOME下的logs文件夹,之后重新格式化namenode

    hadoop namenode -format
    

    再次启动集群服务:

    start-all.sh
    

    这时在到从节点观察应该会看到节点服务

    验证服务

    访问

    http://host:18088
    http://host:9870
    

    来查看服务是否启动

    转载:https://www.jianshu.com/p/d7fa21504784

  • 相关阅读:
    典型页面布局
    网站表单输入框去除浏览器默认样式
    时间格式问题
    经典算法
    css自动换行
    git pull报“unable to update local ref”解决方案
    MYSQL数据插入和更新的语法
    正则表达式去除连续重复的字符
    linux保存住github的账号和密码
    php动态获取常量
  • 原文地址:https://www.cnblogs.com/coolwxb/p/10975352.html
Copyright © 2011-2022 走看看