zoukankan      html  css  js  c++  java
  • jemalloc管理块(arena、bin)

    arena是jemalloc的总的管理块,一个进程中可以有多个arena,arena的最大个可以通过静态变量narenas_auto,。

    可通过静态数组arenas获取进程中所有arena的指针:

    (gdb) p  narenas_auto
    $359 = 2
    (gdb) p *je_arenas@2
    $360 = {0x7f93e02200, 0x7f93f12280}

    可知,目前进程中arena的最大个数是2,它们的指针分别为0x7f93e02200,0x7f93f12280。

    arena的声明如下:

    typedef struct arena_s arena_t;
    
    struct arena_s {
        /* This arena's index within the arenas array. */
        unsigned        ind;
    
        /*
         * Number of threads currently assigned to this arena.  This field is
         * synchronized via atomic operations.
         */
        unsigned        nthreads;
    
        /*
         * There are three classes of arena operations from a locking
         * perspective:
         * 1) Thread assignment (modifies nthreads) is synchronized via atomics.
         * 2) Bin-related operations are protected by bin locks.
         * 3) Chunk- and run-related operations are protected by this mutex.
         */
        malloc_mutex_t        lock;
    
        arena_stats_t        stats;
        /*
         * List of tcaches for extant threads associated with this arena.
         * Stats from these are merged incrementally, and at exit if
         * opt_stats_print is enabled.
         */
        ql_head(tcache_t)    tcache_ql;
    
        uint64_t        prof_accumbytes;
    
        /*
         * PRNG state for cache index randomization of large allocation base
         * pointers.
         */
        uint64_t        offset_state;
    
        dss_prec_t        dss_prec;
    
        /*
         * In order to avoid rapid chunk allocation/deallocation when an arena
         * oscillates right on the cusp of needing a new chunk, cache the most
         * recently freed chunk.  The spare is left in the arena's chunk trees
         * until it is deleted.
         *
         * There is one spare chunk per arena, rather than one spare total, in
         * order to avoid interactions between multiple threads that could make
         * a single spare inadequate.
         */
        arena_chunk_t        *spare;
    
        /* Minimum ratio (log base 2) of nactive:ndirty. */
        ssize_t            lg_dirty_mult;
    
        /* True if a thread is currently executing arena_purge_to_limit(). */
        bool            purging;
    
        /* Number of pages in active runs and huge regions. */
        size_t            nactive;
    
        /*
         * Current count of pages within unused runs that are potentially
         * dirty, and for which madvise(... MADV_DONTNEED) has not been called.
         * By tracking this, we can institute a limit on how much dirty unused
         * memory is mapped for each arena.
         */
        size_t            ndirty;
    
        /*
         * Unused dirty memory this arena manages.  Dirty memory is conceptually
         * tracked as an arbitrarily interleaved LRU of dirty runs and cached
         * chunks, but the list linkage is actually semi-duplicated in order to
         * avoid extra arena_chunk_map_misc_t space overhead.
         *
         *   LRU-----------------------------------------------------------MRU
         *
         *        /-- arena ---
         *        |            |
         *        |            |
         *        |------------|                             /- chunk -
         *   ...->|chunks_cache|<--------------------------->|  /---- |<--...
         *        |------------|                             |  |node| |
         *        |            |                             |  |    | |
         *        |            |    /- run -    /- run -   |  |    | |
         *        |            |    |       |    |       |   |  |    | |
         *        |            |    |       |    |       |   |  |    | |
         *        |------------|    |-------|    |-------|   |  |----| |
         *   ...->|runs_dirty  |<-->|rd     |<-->|rd     |<---->|rd  |<----...
         *        |------------|    |-------|    |-------|   |  |----| |
         *        |            |    |       |    |       |   |  |    | |
         *        |            |    |       |    |       |   |  ----/ |
         *        |            |    -------/    -------/   |         |
         *        |            |                             |         |
         *        |            |                             |         |
         *        ------------/                             ---------/
         */
        arena_runs_dirty_link_t    runs_dirty;
        extent_node_t        chunks_cache;
    
        /*
         * Approximate time in seconds from the creation of a set of unused
         * dirty pages until an equivalent set of unused dirty pages is purged
         * and/or reused.
         */
        ssize_t            decay_time;
        /* decay_time / SMOOTHSTEP_NSTEPS. */
        nstime_t        decay_interval;
        /*
         * Time at which the current decay interval logically started.  We do
         * not actually advance to a new epoch until sometime after it starts
         * because of scheduling and computation delays, and it is even possible
         * to completely skip epochs.  In all cases, during epoch advancement we
         * merge all relevant activity into the most recently recorded epoch.
         */
        nstime_t        decay_epoch;
        /* decay_deadline randomness generator. */
        uint64_t        decay_jitter_state;
        /*
         * Deadline for current epoch.  This is the sum of decay_interval and
         * per epoch jitter which is a uniform random variable in
         * [0..decay_interval).  Epochs always advance by precise multiples of
         * decay_interval, but we randomize the deadline to reduce the
         * likelihood of arenas purging in lockstep.
         */
        nstime_t        decay_deadline;
        /*
         * Number of dirty pages at beginning of current epoch.  During epoch
         * advancement we use the delta between decay_ndirty and ndirty to
         * determine how many dirty pages, if any, were generated, and record
         * the result in decay_backlog.
         */
        size_t            decay_ndirty;
        /*
         * Memoized result of arena_decay_backlog_npages_limit() corresponding
         * to the current contents of decay_backlog, i.e. the limit on how many
         * pages are allowed to exist for the decay epochs.
         */
        size_t            decay_backlog_npages_limit;
        /*
         * Trailing log of how many unused dirty pages were generated during
         * each of the past SMOOTHSTEP_NSTEPS decay epochs, where the last
         * element is the most recent epoch.  Corresponding epoch times are
         * relative to decay_epoch.
         */
        size_t            decay_backlog[SMOOTHSTEP_NSTEPS];
    
        /* Extant huge allocations. */
        ql_head(extent_node_t)    huge;
        /* Synchronizes all huge allocation/update/deallocation. */
        malloc_mutex_t        huge_mtx;
    
        /*
         * Trees of chunks that were previously allocated (trees differ only in
         * node ordering).  These are used when allocating chunks, in an attempt
         * to re-use address space.  Depending on function, different tree
         * orderings are needed, which is why there are two trees with the same
         * contents.
         */
        extent_tree_t        chunks_szad_cached;
        extent_tree_t        chunks_ad_cached;
        extent_tree_t        chunks_szad_retained;
        extent_tree_t        chunks_ad_retained;
    
        malloc_mutex_t        chunks_mtx;
        /* Cache of nodes that were allocated via base_alloc(). */
        ql_head(extent_node_t)    node_cache;
        malloc_mutex_t        node_cache_mtx;
    
        /* User-configurable chunk hook functions. */
        chunk_hooks_t        chunk_hooks;
    
        /* bins is used to store trees of free regions. */
        arena_bin_t        bins[NBINS];
    
        /*
         * Quantized address-ordered trees of this arena's available runs.  The
         * trees are used for first-best-fit run allocation.
         */
        arena_run_tree_t    runs_avail[1]; /* Dynamically sized. */
    };

    其他成员暂时不关注,这里我们先讨论bins这个arena_bin_t数组,数组大小是36,对应36种大小的region和run。

    binind表示bins中的偏移,每一个binind对应一个固定大小的region。其对应关系为:

    usize = index2size(binind);
    
    size_t index2size(szind_t index)
    {
        return (index2size_lookup(index));
    }
    
    size_t index2size_lookup(szind_t index)
    {
        size_t ret = (size_t)index2size_tab[index];
        return (ret);
    }

    其中index2size_tab是静态变量,用于表示index和size的对应关系。

    通过binid,我们还可以通过静态变量arena_bin_info获取对应bin的其他信息:

    (gdb) p je_arena_bin_info[5]
    $375 = {
      reg_size = 80, 
      redzone_size = 0, 
      reg_interval = 80, 
      run_size = 20480, 
      nregs = 256, 
      bitmap_info = {
        nbits = 256, 
        ngroups = 4
      }, 
      reg0_offset = 0
    }

    从bin_info中可以看出这个index为5的bin,它对应的region大小(region_size)为80,run大小(run_size)为20K,region个数(nregs)为256,region需要4个bitmap来表示(ngroups)。

    arena_bin_t的声明如下:

    typedef struct arena_bin_s arena_bin_t;
    
    struct arena_bin_s {
        /*
         * All operations on runcur, runs, and stats require that lock be
         * locked.  Run allocation/deallocation are protected by the arena lock,
         * which may be acquired while holding one or more bin locks, but not
         * vise versa.
         */
        malloc_mutex_t        lock;
    
        /*
         * Current run being used to service allocations of this bin's size
         * class.
         */
        arena_run_t        *runcur;
    
        /*
         * Tree of non-full runs.  This tree is used when looking for an
         * existing run when runcur is no longer usable.  We choose the
         * non-full run that is lowest in memory; this policy tends to keep
         * objects packed well, and it can also help reduce the number of
         * almost-empty chunks.
         */
        arena_run_tree_t    runs;
    
        /* Bin statistics. */
        malloc_bin_stats_t    stats;
    };

    runcur是当前可用于分配的run,

    runs是个红黑树,链接当前arena中,所有可用的相同大小region对应的run,如果runcur已满,就会从runs里找可用的run。

    stats是当前bin对应的run和region的状态信息。

    我们看一下实际运行过程中,bins中的index为5的一个arena_bin_t:

    (gdb) p (*je_arenas[0])->bins[5]
    $365 = {
      lock = {
        lock = {
          __private = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
        }
      }, 
      runcur = 0x7f68408ad0, 
      runs = {
        rbt_root = 0x7f78e06e38
      }, 
      stats = {
        nmalloc = 236529, 
        ndalloc = 229379, 
        nrequests = 1181919, 
        curregs = 7150, 
        nfills = 60225, 
        nflushes = 42510, 
        nruns = 64, 
        reruns = 5402, 
        curruns = 31
      }
    }

    其中runcur值为:

    (gdb)  p /x *(*je_arenas[0])->bins[5].runcur
    $373 = {
      binind = 0x5, 
      nfree = 0x87, 
      bitmap = {0x0, 0xeff7200000000000, 0xfffdeffffff7fdf7, 0xfffffffffdeffffd, 0x0, 0x0, 0x0, 0x0}
    }

    其中,nfree表示的是当前run中空闲的region个数。

    申请内存时的相关代码如下:

    static void *
    arena_malloc_small(tsd_t *tsd, arena_t *arena, szind_t binind, bool zero)
    {
        void *ret;
        arena_bin_t *bin;
        size_t usize;
        arena_run_t *run;
    
        assert(binind < NBINS);
        bin = &arena->bins[binind];
        usize = index2size(binind);
    
        malloc_mutex_lock(&bin->lock);
        if ((run = bin->runcur) != NULL && run->nfree > 0)
            ret = arena_run_reg_alloc(run, &arena_bin_info[binind]);
        else
            ret = arena_bin_malloc_hard(arena, bin);

    bitmap中的每一bit位表示对应region的空闲状态,0表示已使用,1表示空闲。

    由于arena_bin_info[5]中的ngroup的值为4,binind为5的run,它的bitmap数组中的前4个bitmap是有效的。

    即上面runcur对应的bitmap为:

    0x0, 0xeff7200000000000, 0xfffdeffffff7fdf7, 0xfffffffffdeffffd,

    转成二进制:

    0000000000000000000000000000000000000000000000000000000000000000
    1110111111110111001000000000000000000000000000000000000000000000
    1111111111111101111011111111111111111111111101111111110111110111
    1111111111111111111111111111111111111101111011111111111111111101

    可见,bitmap中一共有135个1,也就是说free的region有135个,即0x87个。

    申请内存的相关代码如下:

    JEMALLOC_INLINE_C void *
    arena_run_reg_alloc(arena_run_t *run, arena_bin_info_t *bin_info)
    {
        void *ret;
        size_t regind;
        arena_chunk_map_misc_t *miscelm;
        void *rpages;
    
        regind = (unsigned)bitmap_sfu(run->bitmap, &bin_info->bitmap_info);
        miscelm = arena_run_to_miscelm(run);
        rpages = arena_miscelm_to_rpages(miscelm);
        ret = (void *)((uintptr_t)rpages + (uintptr_t)bin_info->reg0_offset + (uintptr_t)(bin_info->reg_interval * regind));
        run->nfree--;
        return (ret);
    }

    其中bitmap_sfu()返回bitmap中第一个1的位置,并且将该位置0。

    接下来就是通过run找对应的miscelm,再通过miscelm找到run对应的page,它的起始位置rpages。

    rpages的值+regind*reg_interval(同reg_size),就能得出这个空闲的region的实际地址了。

    最后再将run的nfree减一,整个内存申请过程就结束了。

    中间红色部分内容需要了解chunk header相关内容。

  • 相关阅读:
    asp.net mvc 两级分类联动方法示例
    动手实践虚拟网络
    KVM 网络虚拟化基础
    LVM 类型的 Storage Pool
    KVM 存储虚拟化
    CPU 和内存虚拟化原理
    远程管理 KVM 虚机
    启动第一个 KVM 虚机
    准备 KVM 实验环境
    虚拟化
  • 原文地址:https://www.cnblogs.com/YYPapa/p/6914199.html
Copyright © 2011-2022 走看看