zoukankan      html  css  js  c++  java
  • Hadoop内存分配

     9. Determine HDP Memory Configuration Settings

    Two methods can be used to determine YARN and MapReduce memory configuration settings:

    The HDP utility script is the recommended method for calculating HDP memory configuration settings, but information about manually calculating YARN and MapReduce memory configuration settings is also provided for reference.

     9.1. Use the HDP Utility Script to Calculate Memory Configuration Settings

    This section describes how to use the hdp-configuration-utils.py Python script to calculate YARN, MapReduce, Hive, and Tez memory allocation settings based on the node hardware specifications. The hdp-configuration-utils.py script is included in the HDP companion files.

    Running the Script

    To run the hdp-configuration-utils.py script, execute the following command from the folder containing the script:

    python hdp-configuration-utils.py <options>

    With the following options:

    Option Description
    -c CORES The number of cores on each host.
    -m MEMORY The amount of memory on each host in GB.
    -d DISKS The number of disks on each host.
    -k HBASE "True" if HBase is installed, "False" if not.

    Note: You can also use the -h or --help option to display a Help message that describes the options.

    Example

    Running the following command:

    python hdp-configuration-utils.py -c 16 -m 64 -d 4 -k True

    Would return:

     Using cores=16 memory=64GB disks=4 hbase=True
     Profile: cores=16 memory=49152MB reserved=16GB usableMem=48GB disks=4
     Num Container=8
     Container Ram=6144MB
     Used Ram=48GB
     Unused Ram=16GB
     yarn.scheduler.minimum-allocation-mb=6144
     yarn.scheduler.maximum-allocation-mb=49152
     yarn.nodemanager.resource.memory-mb=49152
     mapreduce.map.memory.mb=6144
     mapreduce.map.java.opts=-Xmx4096m
     mapreduce.reduce.memory.mb=6144
     mapreduce.reduce.java.opts=-Xmx4096m
     yarn.app.mapreduce.am.resource.mb=6144
     yarn.app.mapreduce.am.command-opts=-Xmx4096m
     mapreduce.task.io.sort.mb=1792
     tez.am.resource.memory.mb=6144
     tez.am.java.opts=-Xmx4096m
     hive.tez.container.size=6144
     hive.tez.java.opts=-Xmx4096m
     hive.auto.convert.join.noconditionaltask.size=1342177000

     9.2. Manually Calculate YARN and MapReduce Memory Configuration Settings

    This section describes how to manually configure YARN and MapReduce memory allocation settings based on the node hardware specifications.

    YARN takes into account all of the available compute resources on each machine in the cluster. Based on the available resources, YARN negotiates resource requests from applications (such as MapReduce) running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.).

    In a Hadoop cluster, it is vital to balance the usage of memory (RAM), processors (CPU cores) and disks so that processing is not constrained by any one of these cluster resources. As a general recommendation, allowing for two Containers per disk and per core gives the best balance for cluster utilization.

    When determining the appropriate YARN and MapReduce memory configurations for a cluster node, start with the available hardware resources. Specifically, note the following values on each node:

    • RAM (Amount of memory)

    • CORES (Number of CPU cores)

    • DISKS (Number of disks)

    The total available RAM for YARN and MapReduce should take into account the Reserved Memory. Reserved Memory is the RAM needed by system processes and other Hadoop processes (such as HBase).

    Reserved Memory = Reserved for stack memory + Reserved for HBase Memory (If HBase is on the same node)

    Use the following table to determine the Reserved Memory per node.

    Reserved Memory Recommendations

    Total Memory per Node Recommended Reserved System Memory Recommended Reserved HBase Memory
    4 GB 1 GB 1 GB
    8 GB 2 GB 1 GB
    16 GB 2 GB 2 GB
    24 GB 4 GB 4 GB
    48 GB 6 GB 8 GB
    64 GB 8 GB 8 GB
    72 GB 8 GB 8 GB
    96 GB 12 GB 16 GB
    128 GB 24 GB 24 GB
    256 GB 32 GB 32 GB
    512 GB 64 GB 64 GB

    The next calculation is to determine the maximum number of containers allowed per node. The following formula can be used:

    # of containers = min (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)

    Where MIN_CONTAINER_SIZE is the minimum container size (in RAM). This value is dependent on the amount of RAM available -- in smaller memory nodes, the minimum container size should also be smaller. The following table outlines the recommended values:

    Total RAM per Node Recommended Minimum Container Size
    Less than 4 GB 256 MB
    Between 4 GB and 8 GB 512 MB
    Between 8 GB and 24 GB 1024 MB
    Above 24 GB 2048 MB

    The final calculation is to determine the amount of RAM per container:

    RAM-per-container = max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))

    With these calculations, the YARN and MapReduce configurations can be set:

    Configuration File Configuration Setting Value Calculation
    yarn-site.xml yarn.nodemanager.resource.memory-mb = containers * RAM-per-container
    yarn-site.xml yarn.scheduler.minimum-allocation-mb = RAM-per-container
    yarn-site.xml yarn.scheduler.maximum-allocation-mb = containers * RAM-per-container
    mapred-site.xml mapreduce.map.memory.mb = RAM-per-container
    mapred-site.xml         mapreduce.reduce.memory.mb = 2 * RAM-per-container
    mapred-site.xml mapreduce.map.java.opts = 0.8 * RAM-per-container
    mapred-site.xml mapreduce.reduce.java.opts = 0.8 * 2 * RAM-per-container
    yarn-site.xml (check) yarn.app.mapreduce.am.resource.mb = 2 * RAM-per-container
    yarn-site.xml (check) yarn.app.mapreduce.am.command-opts = 0.8 * 2 * RAM-per-container

    Note: After installation, both yarn-site.xml and mapred-site.xml are located in the /etc/hadoop/conf folder.

    Examples

    Cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks.

    Reserved Memory = 6 GB reserved for system memory + (if HBase) 8 GB for HBase

    Min container size = 2 GB

    If there is no HBase:

    # of containers = min (2*12, 1.8* 12, (48-6)/2) = min (24, 21.6, 21) = 21

    RAM-per-container = max (2, (48-6)/21) = max (2, 2) = 2

    Configuration Value Calculation
    yarn.nodemanager.resource.memory-mb = 21 * 2 = 42*1024 MB
    yarn.scheduler.minimum-allocation-mb = 2*1024 MB
    yarn.scheduler.maximum-allocation-mb = 21 * 2 = 42*1024 MB
    mapreduce.map.memory.mb = 2*1024 MB
    mapreduce.reduce.memory.mb          = 2 * 2 = 4*1024 MB
    mapreduce.map.java.opts = 0.8 * 2 = 1.6*1024 MB
    mapreduce.reduce.java.opts = 0.8 * 2 * 2 = 3.2*1024 MB
    yarn.app.mapreduce.am.resource.mb = 2 * 2 = 4*1024 MB
    yarn.app.mapreduce.am.command-opts = 0.8 * 2 * 2 = 3.2*1024 MB

    If HBase is included:

    # of containers = min (2*12, 1.8* 12, (48-6-8)/2) = min (24, 21.6, 17) = 17

    RAM-per-container = max (2, (48-6-8)/17) = max (2, 2) = 2

    Configuration Value Calculation
    yarn.nodemanager.resource.memory-mb = 17 * 2 = 34*1024 MB
    yarn.scheduler.minimum-allocation-mb = 2*1024 MB
    yarn.scheduler.maximum-allocation-mb = 17 * 2 = 34*1024 MB
    mapreduce.map.memory.mb = 2*1024 MB
    mapreduce.reduce.memory.mb          = 2 * 2 = 4*1024 MB
    mapreduce.map.java.opts = 0.8 * 2 = 1.6*1024 MB
    mapreduce.reduce.java.opts = 0.8 * 2 * 2 = 3.2*1024 MB
    yarn.app.mapreduce.am.resource.mb = 2 * 2 = 4*1024 MB
    yarn.app.mapreduce.am.command-opts = 0.8 * 2 * 2 = 3.2*1024 MB

    Notes:

    1. Changing yarn.scheduler.minimum-allocation-mb without also changing yarn.nodemanager.resource.memory-mb, or changing yarn.nodemanager.resource.memory-mb without also changing yarn.scheduler.minimum-allocation-mb changes the number of containers per node.

    2. If your installation has high RAM but not many disks/cores, you can free up RAM for other tasks by lowering both yarn.scheduler.minimum-allocation-mb and yarn.nodemanager.resource.memory-mb.

     9.2.1. Configuring MapReduce Memory Settings on YARN

    MapReduce runs on top of YARN and utilizes YARN Containers to schedule and execute its Map and Reduce tasks. When configuring MapReduce resource utilization on YARN, there are three aspects to consider:

    • The physical RAM limit for each Map and Reduce task.

    • The JVM heap size limit for each task.

    • The amount of virtual memory each task will receive.

    You can define a maximum amount of memory for each Map and Reduce task. Since each Map and Reduce task will run in a separate Container, these maximum memory settings should be equal to or greater than the YARN minimum Container allocation.

    For the example cluster used in the previous section (48 GB RAM, 12 disks, and 12 cores), the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. Therefore we will assign 4 GB for Map task Containers, and 8 GB for Reduce task Containers.

    In mapred-site.xml:

    <name>mapreduce.map.memory.mb</name>
    <value>4096</value>
    <name>mapreduce.reduce.memory.mb</name>
    <value>8192</value>

    Each Container will run JVMs for the Map and Reduce tasks. The JVM heap sizes should be set to values lower than the Map and Reduce Containers, so that they are within the bounds of the Container memory allocated by YARN.

    In mapred-site.xml:

    <name>mapreduce.map.java.opts</name>
    <value>-Xmx3072m</value>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx6144m</value>

    The preceding settings configure the upper limit of the physical RAM that Map and Reduce tasks will use. The virtual memory (physical + paged memory) upper limit for each Map and Reduce task is determined by the virtual memory ratio each YARN Container is allowed. This ratio is set with the following configuration property, with a default value of 2.1:

    In yarn-site.xml:

    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>

    With the preceding settings on our example cluster, each Map task will receive the following memory allocations:

    • Total physical RAM allocated = 4 GB

    • JVM heap space upper limit within the Map task Container = 3 GB

    • Virtual memory upper limit = 4*2.1 = 8.2 GB

    With MapReduce on YARN, there are no longer pre-configured static slots for Map and Reduce tasks. The entire cluster is available for dynamic resource allocation of Map and Reduce tasks as needed by each job. In our example cluster, with the above configurations, YARN will be able to allocate up to 10 Mappers (40/4) or 5 Reducers (40/8) on each node (or some other combination of Mappers and Reducers within the 40 GB per node limit).

  • 相关阅读:
    Freemarker-2.3.22 Demo
    Freemarker-2.3.22 Demo
    Freemarker-2.3.22 Demo
    Freemarker-2.3.22 Demo
    Oracle PLSQL Demo
    Oracle PLSQL Demo
    Oracle PLSQL Demo
    Oracle PLSQL Demo
    Oracle PLSQL Demo
    Oracle PLSQL Demo
  • 原文地址:https://www.cnblogs.com/notebook2011/p/7163576.html
Copyright © 2011-2022 走看看