zoukankan      html  css  js  c++  java
  • 【原创】大数据基础之Hadoop(2)hdfs和yarn最简绿色部署

    环境:3结点集群

    192.168.0.1
    192.168.0.2
    192.168.0.3

    1 配置root用户服务期间免密登录

    参考:https://www.cnblogs.com/barneywill/p/10271679.html

    2 安装ansible

    参考:https://www.cnblogs.com/barneywill/p/10263278.html

    3 在所有服务器上创建hadoop用户,配置hadoop用户服务期间免密登录

    参考:https://www.cnblogs.com/barneywill/p/10271679.html

    4 同步host

    # echo "" > /tmp/hosts
    # echo "192.168.0.1 node0" > /tmp/hosts
    # echo "192.168.0.2 node1" > /tmp/hosts
    # echo "192.168.0.3 node2" > /tmp/hosts

    # ansible all-servers -m copy -a "src=/tmp/hosts dest=/tmp"
    # ansible all-servers -m shell -a "cat /tmp/hosts >> /etc/hosts && cat /etc/hosts"

    5 拷贝到所有服务器上并解压

    # ansible all-servers -m copy -a 'src=/src/path/to/hadoop-2.6.5.tar.gz dest=/dest/path/to/'
    # ansible all-servers -m shell -a 'tar xvf /dest/path/to/hadoop-2.6.5.tar.gz -C /app/path'

    6 准备目录:tmp、namenode、datanode

    # ansible all-servers -m shell -a 'mkdir -p /data/hadoop/tmp && mkdir -p /data/hadoop/hdfs/namenode && mkdir -p /data/hadoop/hdfs/datanode && chown -R hadoop.hadoop /data/hadoop'

    7 准备配置文件

    slaves

    node0
    node1
    node2

    core-site.xml

    <configuration>

      <property>

         <name>fs.defaultFS</name>

         <value>hdfs://node0:9000</value>

      </property>

      <property>

         <name>hadoop.tmp.dir</name>

         <value>/data/hadoop/tmp</value>

      </property>

    </configuration>

    hdfs-site.xml

    <configuration>

     <property>

       <name>dfs.replication</name>

       <value>3</value>

     </property>

     <property>

        <name>dfs.name.dir</name>

        <value>/data/hadoop/hdfs/namenode</value>

      </property>

     <property>

        <name>dfs.data.dir</name>

        <value>/data/hadoop/hdfs/datanode</value>

     </property>

    </configuration>

    mapred-site.xml

    <configuration>

        <property>

            <name>mapreduce.framework.name</name>

            <value>yarn</value>

        </property>

    </configuration>

    yarn-site.xml

    <configuration>

        <property>

            <name>yarn.resourcemanager.hostname</name>

            <value>node0</value>

        </property>

        <property>

            <name>yarn.nodemanager.aux-services</name>

            <value>mapreduce_shuffle</value>

        </property>

        <property>

            <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>

            <value>org.apache.hadoop.mapred.ShuffleHandler</value>

        </property>

    </configuration>

    1)默认管理8G 8core资源,如下修改

        <property>

            <name>yarn.nodemanager.resource.memory-mb</name>

            <value>8196</value>

        </property>

        <property>

            <name>yarn.nodemanager.resource.cpu-vcores</name>

            <value>8</value>

        </property>

    2)开启日志聚合

        <property>

            <name>yarn.log-aggregation-enable</name>

            <value>true</value>

        </property>

    3)如果由于虚拟内存原因导致container被kill报错,比如

    2019-02-25 17:54:19,481 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=48342,containerID=container_1551078668160_0012_02_000001] is running beyond virtual memory limits. Current usage: 380.9 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.

    增加配置

        <property>

            <name>yarn.nodemanager.vmem-check-enabled</name>

            <value>false</value>

        </property>

     or

        <property>

            <name>yarn.nodemanager.vmem-pmem-ratio</name>

            <value>4</value>

        </property>

    8 将配置同步到所有服务器上

    # ansible all-servers -m copy -a 'src=/path/to/config/ dest=/app/path/hadoop-2.6.5/etc/hadoop/'

    9 同步环境变量

    # echo '' > /tmp/profile
    # echo 'export HADOOP_HOME=/app/path/hadoop-2.6.5' >> /tmp/profile
    # echo 'export JAVA_HOME=/app/path/jdk1.8.0_141/' >> /tmp/profile
    # echo 'export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH' >> /tmp/profile
    # ansible all-servers -m copy -a 'src=/tmp/profile dest=/tmp/'
    # ansible all-servers -m shell -a 'cat /tmp/profile >> /etc/bashrc'

    10 启动hdfs

    # su - hadoop
    $ /app/path/hadoop-2.6.5/bin/hadoop namenode -format
    $ /app/path/hadoop-2.6.5/sbin/start-dfs.sh
    $ hdfs dfsadmin -report

    11 启动yarn

    # su - hadoop
    $ /app/path/hadoop-2.6.5/sbin/start-yarn.sh
    $ yarn node -list

  • 相关阅读:
    J.U.C并发框架源码阅读(十五)CopyOnWriteArrayList
    J.U.C并发框架源码阅读(十四)ScheduledThreadPoolExecutor
    J.U.C并发框架源码阅读(十三)ThreadPoolExecutor
    Django基础之request对象
    Django基础之给视图加装饰器
    Django基础之初识视图
    Django基础之CBV和FBV
    Django基础之template
    Django基础之命名空间模式(include)
    Django基础之命名URL和URL反向解析
  • 原文地址:https://www.cnblogs.com/barneywill/p/10428098.html
Copyright © 2011-2022 走看看