zoukankan      html  css  js  c++  java
  • 内存管理中关于Movable的理解

    内核中的管理区

    内核中定义了如下一些管理区zone:

    enum zone_type {
    #ifdef CONFIG_ZONE_DMA
        /*
         * ZONE_DMA is used when there are devices that are not able
         * to do DMA to all of addressable memory (ZONE_NORMAL). Then we
         * carve out the portion of memory that is needed for these devices.
         * The range is arch specific.
         *
         * Some examples
         *
         * Architecture     Limit
         * ---------------------------
         * parisc, ia64, sparc  <4G
         * s390         <2G
         * arm          Various
         * alpha        Unlimited or 0-16MB.
         *
         * i386, x86_64 and multiple other arches
         *          <16M.
         */
        ZONE_DMA,
    #endif
    #ifdef CONFIG_ZONE_DMA32
        /*
         * x86_64 needs two ZONE_DMAs because it supports devices that are
         * only able to do DMA to the lower 16M but also 32 bit devices that
         * can only do DMA areas below 4G.
         */
        ZONE_DMA32,
    #endif
        /*
         * Normal addressable memory is in ZONE_NORMAL. DMA operations can be
         * performed on pages in ZONE_NORMAL if the DMA devices support
         * transfers to all addressable memory.
         */
        ZONE_NORMAL,
    #ifdef CONFIG_HIGHMEM
        /*
         * A memory area that is only addressable by the kernel through
         * mapping portions into its own address space. This is for example
         * used by i386 to allow the kernel to address the memory beyond
         * 900MB. The kernel will set up special mappings (page
         * table entries on i386) for each page that the kernel needs to
         * access.
         */
        ZONE_HIGHMEM,
    #endif
        ZONE_MOVABLE,
        __MAX_NR_ZONES
    };
    
    
    • ZONE_DMA
      该管理区是一些设备无法使用DMA访问所有地址的范围,因此特意划分出来的一块内存,专门用于特殊DMA访问分配使用的区域。比如x86架构此区域为0-16M
    • ZONE_NORMAL
      NORMAL区域是直接映射区。
    • ZONE_HIGHMEM
      高端内存管理区,申请的内存,需要内核进行map后才能访问。对于64bit Arch架构,我们一般不需要高端内存区,因为地址空间足够映射所有的物理内存。
    • ZONE_MOVABLE
      这个区域是一个特殊的存在,主要是为了支持memory hotplug功能,所以MOVABLE表示可移除,其实它也表示可迁移。

    简单来说,可迁移的页面不一定都在ZONE_MOVABLE中,但是ZONE_MOVABLE中的也页面必须都是可迁移的,我们通过查看/proc/pagetypeinfo来看下实例:

    xie:/proc # cat pagetypeinfo                                                 
    Page block order: 10
    Pages per block:  1024
    
    Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 
    Node    0, zone      DMA, type    Unmovable     76     50     24     20     27     25     19      3      1      2      0 
    Node    0, zone      DMA, type      Movable    117     35     28    172    281     93     49     21      7      4      4 
    Node    0, zone      DMA, type  Reclaimable      0      3      1      0      0      0      0      1      0      1      0 
    Node    0, zone      DMA, type          CMA   3380   1798    856    386    152     55     21      8      4      0      0 
    Node    0, zone      DMA, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0 
    Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
    Node    0, zone   Normal, type    Unmovable    521    654    531    286    132     52     15      2      1      4      0 
    Node    0, zone   Normal, type      Movable      1      8     21     21      1      1      5      3      1      0      0 
    Node    0, zone   Normal, type  Reclaimable     18     24      1      1      0      0      1      0      1      0      0 
    Node    0, zone   Normal, type          CMA      9      0      1      6      2      0      1      0      0      0      0 
    Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0 
    Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
    Node    0, zone  Movable, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0 
    Node    0, zone  Movable, type      Movable    963    649    188     48     24    112     49     21      8      3     50 
    Node    0, zone  Movable, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0 
    Node    0, zone  Movable, type          CMA      0      0      0      0      0      0      0      0      0      0      0 
    Node    0, zone  Movable, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0 
    Node    0, zone  Movable, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
    
    Number of blocks type     Unmovable      Movable  Reclaimable          CMA   HighAtomic      Isolate 
    Node 0, zone      DMA          123          310           18           61            0            0 
    Node 0, zone   Normal          406          310           43            9            0            0 
    Node 0, zone  Movable            0          256            0            0            0            0 
    
    Number of mixed blocks    Unmovable      Movable  Reclaimable          CMA   HighAtomic      Isolate 
    Node 0, zone      DMA            0           61            0            0            0            0 
    Node 0, zone   Normal            0           11            3            0            0            0 
    Node 0, zone  Movable            0            0            0            0            0            0 
    

    可以看到在Movable Zone中不存在Unmovable类型的页面,只有Movable类型的页面。

    管理区ZONE_MOVABLE

    这个管理区,主要是和memory hotplug功能有关,为什么要设计内存热插拔功能,主要是为了如下两点考虑:
    1.逻辑内存热插拔,对于虚拟机的支持,对于虚拟机按照需求来分配可用内存
    2.物理内存热插拔,对于NUMA服务器的支持,不需要的内存就设置为offline,以降低功耗
    3.优化内存碎片问题

    这个管理区域存放的page都是可迁移的,只能被带有__GFP_HIGHMEM和__GFP_MOVABLE标志的内存申请所使用,比如:

    #define GFP_HIGHUSER_MOVABLE    (GFP_HIGHUSER | __GFP_MOVABLE)
    
    #define GFP_USER    (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
    #define GFP_HIGHUSER    (GFP_USER | __GFP_HIGHMEM)
    
    

    主要注意的是不要把分配标志__GFP_MOVABLE和管理区ZONE_MOVABLE混淆,两者并不是对应的关系。

    • __GFP_MOVABLE表示的是一种分配页面属性,表示页面可迁移,即使不在ZONE_MOVABLE管理区,有些页面也是可以迁移的,比如cache;
    • ZONE_MOVABLE表示的是管理区,和内存的热插拔有关,当然其中的页面必须要可迁移才能支持热插拔。

    分配标志__GFP_MOVABLE

    #define __GFP_DMA   ((__force gfp_t)___GFP_DMA)
    #define __GFP_HIGHMEM   ((__force gfp_t)___GFP_HIGHMEM)
    #define __GFP_DMA32 ((__force gfp_t)___GFP_DMA32)
    #define __GFP_MOVABLE   ((__force gfp_t)___GFP_MOVABLE)  /* Page is movable */
    #define GFP_ZONEMASK    (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE)
    

    这几个分配标志被称为Zone modifiers,他们用来标识优先从哪个zone分配内存。

    bit       result
    =================
    0x0    => NORMAL
    0x1    => DMA or NORMAL
    0x2    => HIGHMEM or NORMAL
    0x3    => BAD (DMA+HIGHMEM)
    0x4    => DMA32 or DMA or NORMAL
    0x5    => BAD (DMA+DMA32)
    0x6    => BAD (HIGHMEM+DMA32)
    0x7    => BAD (HIGHMEM+DMA32+DMA)
    0x8    => NORMAL (MOVABLE+0)
    0x9    => DMA or NORMAL (MOVABLE+DMA)
    0xa    => MOVABLE (Movable is valid only if HIGHMEM is set too)
    0xb    => BAD (MOVABLE+HIGHMEM+DMA)
    0xc    => DMA32 (MOVABLE+DMA32)
    0xd    => BAD (MOVABLE+DMA32+DMA)
    0xe    => BAD (MOVABLE+DMA32+HIGHMEM)
    0xf    => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
    

    一共有4个bit用来表示组合类型,其中低3个bit只能选择一个(__GFP_DMA/__GFP_HIGHMEM/__GFP_DMA32),而__GFP_MOVABLE可以和其他三种的任何一个组合使用,因此一共有16中组合,根据各种类型进行一个偏移存放到一个long类型table中。

    GFP_ZONE_TABLE:
    
    |BAD|BAD|BAD|DMA32|BAD|MOVABLE|......|NORMAL|
    
    

    这些结果会根据上面的bit组合值做一个偏移,存放到ZONE TABLE中,从而可以根据组合快速定位要使用的ZONE管理区。由上可见,__GFP_MOVABLE代表的是一种分配策略,并不是和ZONE_MOVABLE匹配的,上一节也做了介绍,必须是(__GFP_HIGHMEM和__GFP_MOVABLE)同时置位才会从ZONE_MOVABLE管理区去分配内存。

    The zone fallback order is MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA
    

    因此我们分配内存时并不一定就会按照传入的FLAG来进行分配,如果对应zone中没有符合要求的内存,那么会依次进行fallback查找符合要求的内存。

    如何使能ZONE_MOVABLE

    - For all memory hotplug
        Memory model -> Sparse Memory  (CONFIG_SPARSEMEM)
        Allow for memory hot-add       (CONFIG_MEMORY_HOTPLUG)
    
    - To enable memory removal, the followings are also necessary
        Allow for memory hot remove    (CONFIG_MEMORY_HOTREMOVE)
        Page Migration                 (CONFIG_MIGRATION)
    
    - For ACPI memory hotplug, the followings are also necessary
        Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY)
        This option can be kernel module.
    
    - As a related configuration, if your box has a feature of NUMA-node hotplug
      via ACPI, then this option is necessary too.
        ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu)
        (CONFIG_ACPI_CONTAINER).
        This option can be kernel module too.
    
    1) When kernelcore=YYYY boot option is used,
       Size of memory not for movable pages (not for offline) is YYYY.
       Size of memory for movable pages (for offline) is TOTAL-YYYY.
    
    2) When movablecore=ZZZZ boot option is used,
       Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ.
       Size of memory for movable pages (for offline) is ZZZZ.
    

    内核中定义了sysfs节点用来控制内存的热插拔:

    % echo online > /sys/devices/system/memory/memoryXXX/state
    

    使能内存。

    % echo online_movable > /sys/devices/system/memory/memoryXXX/state
    

    切换内存管理区为ZONE_MOVABLE。

    % echo online_kernel > /sys/devices/system/memory/memoryXXX/state
    

    切换内存管理区为ZONE_NORMAL。

    如何决定MOVABLE_ZONE的大小

    我们先来看下在memory zone初始化时的处理:
    对于NUMA使能的系统处理是这样的:

    zone_sizes_init->free_area_init_nodes->find_zone_movable_pfns_for_nodes:
    /*
     * If movable_node is specified, ignore kernelcore and movablecore
     * options.
     */
    if (movable_node_is_enabled()) {
        for_each_memblock(memory, r) {
            if (!memblock_is_hotpluggable(r))
                continue;
    
            nid = r->nid;
    
            usable_startpfn = PFN_DOWN(r->base);
            zone_movable_pfn[nid] = zone_movable_pfn[nid] ?
                min(usable_startpfn, zone_movable_pfn[nid]) :
                usable_startpfn;
        }
    
        goto out2;
    }
    
    

    当我们在dts设备树中配置对应的property时就会配置对应的memblock flag:

    int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
                         int depth, void *data)
    {
       bool hotpluggable;
       hotpluggable = of_get_flat_dt_prop(node, "hotpluggable", NULL);
       while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
         u64 base, size;
    
         base = dt_mem_next_cell(dt_root_addr_cells, &reg);
         size = dt_mem_next_cell(dt_root_size_cells, &reg);
    
         if (size == 0)
             continue;
         pr_debug(" - %llx ,  %llx
    ", (unsigned long long)base,
             (unsigned long long)size);
    
         early_init_dt_add_memory_arch(base, size);
    
         if (!hotpluggable)
             continue;
    
         if (early_init_dt_mark_hotplug_memory_arch(base, size))
             pr_warn("failed to mark hotplug range 0x%llx - 0x%llx
    ",
                 base, base + size);
        }
    
    }
    
    int __init __weak early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size)
    {
        return memblock_mark_hotplug(base, size);
    }
    
    int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
    {
        return memblock_setclr_flag(base, size, 1, MEMBLOCK_HOTPLUG);
    }  

    from: https://blog.csdn.net/rikeyone/article/details/86498298
  • 相关阅读:
    FZU OJ 1056 :扫雷游戏
    HPU 1166: 阶乘问题(一)
    常用的一些模板
    PAT天梯:L1-019. 谁先倒
    HPU 1437: 王小二的求值问题
    《编程珠玑》阅读小记(7) — 代码调优与节省空间
    《编程珠玑》阅读小记(6) — 算法设计技术
    《编程珠玑》阅读小记(5) — 编程小事
    《编程珠玑》阅读小记(4) — 编写正确的程序
    《C/C++专项练习》— (1)
  • 原文地址:https://www.cnblogs.com/aspirs/p/12781693.html
Copyright © 2011-2022 走看看