zoukankan      html  css  js  c++  java
  • hadoop集群部署

    首先说明下,整理的比较乱,遇到问题,解决问题即可

    1. 需要确认部署的服务器ip

    0 1 2 3 代表四个ip

    另外需要一台服务器,做远程操控用

    2. 在操控机上 执行 ssh-keygen,生成本机秘钥文件(如果已经有,跳过本步骤),比如用户 test,秘钥文件路径为 /home/test/.ssh/

    操控机上需要安装ansible 

    配置ansible安装源

    wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-6.repo
    

     安装ansible

    1 yum -y install ansible 

     准备 ansible host清单文件: hosts

    内容如下:

    【hadoop_host】
    0ip ansible_ssh_user=test
    1ip ansible_ssh_user=test
    2ip ansible_ssh_user=test 
    3ip ansible_ssh_user=test
    

    确保操控机到 hadoop集群服务 的网路ok,至少ssh没问题

    前提工作准备完成后,准备进行初始化工作,初始化工作包括(sudo无密码,远程允许sudo执行,ulimit 系统调优)

    1.首先设置test用户执行sudo无密码操作
    前提是test用户在wheel组
    ansible -i ./hosts hadoop_host -m shell -a " sed 's/^# %wheel.*NOPASSWD: ALL/%wheel ALL=(ALL) NOPASSWD: ALL/' -i /etc/sudoers" -s --ask-sudo-pass-k 是要求输入密码选项
    输入密码后,此命令的作用是test用在在远程服务器上可以sudo无密码
    2. 操作 test用户可以远程执行sudo命令权限
    ansible -i ./hosts hadoop_host -m shell -a " sed -i '/Defaults.*requiretty/a Defaults: test !requiretty' /etc/sudoers" -s --ask-sudo-pass3. ulimit参数调整
    
    ansible -i ./hosts hadoop_host -m shell -a " sed -i  '$ a fs.file-max = 65535'  /etc/sysctl.conf && sudo sed -i 's/1024/65535/' /etc/security/limits.d/90-nproc.conf && sudo sed -i '$ a * soft nofile 65535\n* hard nofile 65535' /etc/security/limits.conf    " -s --ask-sudo-passk

    接下来需要准备ssh无秘钥命令(从操控机到hadoop集群各服务器)

    参考本随笔

     http://www.cnblogs.com/jackchen001/p/6381270.html

    安装jdk,并在hadoop集群配置java环境变量

     前提是,ssh无秘钥通道已打通
     1 生成jdk环境变量文件
     echo '
     export JAVA_HOME=/usr/java/latest/
     export PATH=$JAVA_HOME/bin:$PATH ' >> java.sh
     2 安装jdk
     ansible -i ./hosts hadoop_host -m yum -a "name=jdk state=present" -s
     3 传送jdk环境变量文件
     ansible -i ./hosts hadoop_host -m copy -a "src=java.sh dest=/etc/profile.d/" -s
     4 更改java安装目录属组权限
      ansible -i ./hosts hadoop_host -m shell -a "chown -R hadoop.hadoop /usr/java/" -s

    查阅好的ansible模块介绍文章

    http://breezey.blog.51cto.com/2400275/1555530/

    hadoop集群hosts映射文件
    生产hosts
    echo '0 master
    1 slave
    2 slave
    3 slave' >> /tmp/hosts
    发送至hadoop集群服务器上
    ansible -i ./hosts hadoop_host -m copy -a "src=/tmp/hosts dest=/etc/hosts" -s
    更改hostname
    ansible -i ./hosts hadoop_host -m shell -a "sed -i 's/.localdomain//g' /etc/sysconfig/network && service network restart " -s

    下载hadoop,并配置

    下载hadoop安装包
     ansible -i ./hosts hadoop_host -m  ger_url -a "url=http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.0.0-alpha2/hadoop-3.0.0-alpha2.tar.gz dest=/opt/" -s
    这个下载命令执行的话,hadoop集群都会去下载hadoop安装包,造成网络资源的浪费
    最好是在操控机上下载,并配置好,然后发送到hadoop集群服务器上
    发送命令如下
    ansible -i ./hosts hadoop_host -m copy -a "src=/opt/hadoop dest=/opt/ owner=hadoop user=hadoop mode=0755" -s

    hadoop环境变量配置

    生成hadoop环境变量文件
    echo '
    export HADOOP_HOME=/opt/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/opt/hadoop/lib/native/"
    export HADOOP_COMMON_LIB_NATIVE_DIR="/opt/hadoop/lib/native/"
    ' >> hadoop.sh
    
    将hadoop环境文件发送至集群
    ansible -i ./hosts hadoop_host -m copy -a "src=hadoo.sh dest=/etc/profiled./" -s

    最重要的一件事,hadoop用需要在集群服务器之间可以无秘钥ssh,并执行命令

    操控机上
    创建hadoop用户,并设置hadoo用户密码
    http://www.cnblogs.com/jackchen001/p/6381270.html
    操控机上
    ansible -i ./hosts hadoop_host -m shell -a "ssh-keygen -q" -s
    等hadoop用户创建完,并设置密码,ssh无秘钥操作都做完后
    需要在每台集群服务器上,执行 rsync_key playbook
    以确保集群服务器之间hadoop有权限可以自由执行命令并ssh
    让hadoop用户可以sudo
    ansible -i ./hosts hadoop_host -m  shell -a "sed -i '$ a %hadoop ALL=(ALL) NOPASSWD: ALL ' /etc/sudoers" -s

    做好这件事,然后就上传hadoop配置文件

    http://hadoop.apache.org/docs/current/ hadoop官网

    配置好的hadoop配置文件如下:

    core-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    <configuration>
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/opt/hadoop/tmp</value>
    </property>
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://master:9000</value>
    </property>
    <property> 
      <name>dfs.name.dir</name>           
      <value>/opt/hadoop/name</value> 
    </property>
    
    </configuration>
    hdfs-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    <configuration>
    <property>
        <name>dfs.replication</name>  
        <value>3</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>  
        <value>file:/opt/hadoop/name1,/opt/hadoop/name2</value>
    </property>
    
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/opt/hadoop/data1,/opt/hadoop/data2</value>
    </property>
    
    <property>
        <name>dfs.namnode.secondary.http-address</name>
        <value>slave1:9001</value>
    </property>
    
    
    </configuration>
    mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    <configuration>
    
        <property>  
            <name>mapreduce.framework.name</name>  
            <value>yarn</value>  
        </property>
    
    <property>
        <name>mapred.job.tracker</name>  
        <value>master:9001</value>
    </property>
    <property>
        <name>mapred.system.dir</name>  
        <value>/opt/hadoop/mapred_system</value>
    </property>
    <property>
        <name>mapred.local.dir</name>  
        <value>/opt/hadoop/mapred_local</value>
    </property>
    
    <property>
            <name>mapreduce.application.classpath</name>
            <value>
                    /opt/hadoop/etc/hadoop,
                    /opt/hadoop/lib/native/*,
                    /opt/hadoop/share/hadoop/common/*,
                    /opt/hadoop/share/hadoop/common/lib/*,
                    /opt/hadoop/share/hadoop/hdfs/*,
                    /opt/hadoop/share/hadoop/hdfs/lib/*,
                    /opt/hadoop/share/hadoop/mapreduce/*,
                    /opt/hadoop/share/hadoop/mapreduce/lib/*,
                    /opt/hadoop/share/hadoop/yarn/*,
                    /opt/hadoop/share/hadoop/yarn/lib/*
            </value>
    </property>
    
    </configuration>
    yarn-site.xml
    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    
        <property>  
            <name>yarn.nodemanager.aux-services</name>  
            <value>mapreduce_shuffle</value>  
        </property>  
        <property>  
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
            <value>org.apache.hadoop.mapred.ShuffleHandle</value>  
        </property>  
        <property>  
            <name>yarn.resourcemanager.resource-tracker.address</name>  
            <value>master:8025</value>  
        </property>  
        <property>  
            <name>yarn.resourcemanager.scheduler.address</name>  
            <value>master:8030</value>  
        </property>  
        <property>  
            <name>yarn.resourcemanager.address</name>  
            <value>master:8040</value>  
        </property>
        <property>  
            <name>yarn.resourcemanager.admin.address</name>  
            <value>master:8033</value>  
        </property>
        <property>  
            <name>yarn.resourcemanager.webapp.address</name>  
            <value>master:8034</value>  
        </property>
    
    </configuration>
    master文件内容
    master
    slaves文件内容
    slave1
    slave2
    slave3
    workers文件内容
    slave1
    slave2
    slave3
    hadoop-env.sh 文件中增加
    export JAVA_HOME=/usr/java/latest

    将以上配置文件在操控机上改好,然后传送至集群服务器上

    总结了一半,自己懵了,我好尴尬啊!!!!!

    在尝试学习新的语言之前先理解这门语言的设计原理能够让你在探索这门新语言时保持一个清醒而且开发的状态。
  • 相关阅读:
    Virtio:针对 Linux 的 I/O 虚拟化框架
    修复MySQL的MyISAM表命令check table用法
    Linux 下用 smartd 监测硬盘状况
    借助JVM生日的时机,说说关于JVM你所不知道的那些事
    实时流计算、Spark Streaming、Kafka、Redis、Exactly-once、实时去重
    使用Kafka、Elasticsearch、Grafana搭建业务监控系统(三)Elasticsearch
    zabbix监控多实例tomcat--不使用JMX
    KAFKA随机产生JMX 端口指定的问题
    nginx+keepalived高可用服务器宕机解决方案
    tomcat 内存参数优化示例
  • 原文地址:https://www.cnblogs.com/jackchen001/p/6396234.html
Copyright © 2011-2022 走看看