zoukankan      html  css  js  c++  java
  • TCMalloc源码学习(二)

    替换libc中的malloc free

     

    • 不同平台替换方式不同。 基于unix的系统上的glibc,使用了weak alias的方式替换。具体来说是因为这些入口函数都被定义成了weak symbols,再加上gcc支持 alias attribute,所以替换就变成了这种通用形式:

    void* malloc(size_t size) __THROW __attribute__ ((alias (tc_malloc)))

    因此所有malloc的调用都跳转到了tc_malloc的实现。

     

    小块内存分配 do_malloc_small

     

    小于等于kMaxSize(256K)的内存被划定为小块内存了,由函数do_malloc_small处理,定义如下:

     
    1 inline void * do_malloc_small( ThreadCache* heap , size_t size) { 2 3 ASSERT( Static::IsInited ()); 4 5 ASSERT( heap != NULL ); 6 7 size_t cl = Static ::sizemap()-> SizeClass(size ); 8 9 size = Static::sizemap ()->class_to_size( cl); 10 11 if (( FLAGS_tcmalloc_sample_parameter > 0) && heap ->SampleAllocation( size)) { 12 13 return DoSampledAllocation (size); 14 15 } else { 16 17 // The common case, and also the simplest. This just pops the 18 19 // size-appropriate freelist, after replenishing it if it's empty. 20 21 return CheckedMallocResult (heap-> Allocate(size , cl)); 22 23 } 24 25 } 26

    请求的size会被sizemap对齐成某一个相近的尺寸。sizemap管理着这些映射关系,从源size到目标size的映射主要是通过三个map实现的:

     

    graph1_thumb1

      1. ClassIndex映射

      2. 映射方式在代码里面有比较详细的注释:
      1 // Sizes <= 1024 have an alignment >= 8. So for such sizes we have an 2 // array indexed by ceil(size/8). Sizes > 1024 have an alignment >= 128. 3 // So for these larger sizes we have an array indexed by ceil(size/128). 4 // 5 // We flatten both logical arrays into one physical array and use 6 // arithmetic to compute an appropriate index. The constants used by 7 // ClassIndex() were selected to make the flattening work. 8 // 9 // Examples: 10 // Size Expression Index 11 // ------------------------------------------------------- 12 // 0 (0 + 7) / 8 0 13 // 1 (1 + 7) / 8 1 14 // ... 15 // 1024 (1024 + 7) / 8 128 16 // 1025 (1025 + 127 + (120<<7)) / 128 129 17 // ... 18 // 32768 (32768 + 127 + (120<<7)) / 128 376 19

    简而言之就是 :<= 1024字节按照8字节向上取整对齐,>1024按照128字节对齐

     

    class_array_和class_to_size_

    • class_array_和class_to_size_是简单的数组,在模块加载的时候在SizeMap::Init中初始化 :
    1 // Compute the size classes we want to use 2 int sc = 1; // Next size class to assign 3 int alignment = kAlignment; 4 CHECK_CONDITION(kAlignment <= kMinAlign); 5 for (size_t size = kAlignment; size <= kMaxSize; size += alignment) { 6 alignment = AlignmentForSize(size); 7 CHECK_CONDITION((size % alignment) == 0); 8 9 int blocks_to_move = NumMoveSize(size) / 4; 10 size_t psize = 0; 11 do { 12 psize += kPageSize; 13 // Allocate enough pages so leftover is less than 1/8 of total. 14 // This bounds wasted space to at most 12.5%. 15 while ((psize % size) > (psize >> 3)) { 16 psize += kPageSize; 17 } 18 // Continue to add pages until there are at least as many objects in 19 // the span as are needed when moving objects from the central 20 // freelists and spans to the thread caches. 21 } while ((psize / size) < (blocks_to_move)); 22 const size_t my_pages = psize >> kPageShift; 23 24 if (sc > 1 && my_pages == class_to_pages_[sc-1]) { 25 // See if we can merge this into the previous class without 26 // increasing the fragmentation of the previous class. 27 const size_t my_objects = (my_pages << kPageShift) / size; 28 const size_t prev_objects = (class_to_pages_[sc-1] << kPageShift) 29 / class_to_size_[sc-1]; 30 if (my_objects == prev_objects) { 31 // Adjust last class to include this size 32 class_to_size_[sc-1] = size; 33 continue; 34 } 35 } 36 37 // Add new class 38 class_to_pages_[sc] = my_pages; 39 class_to_size_[sc] = size; 40 sc++; 41 } 42

    class_to_size_的映射关系是按照不同size的对齐大小累加而成的,而对齐大小由 alignment = AlignmentForSize(size); 计算出,代码如下:

    1 int AlignmentForSize (size_t size) { 2 3 int alignment = kAlignment ; 4 5 if ( size > kMaxSize ) { 6 7 // Cap alignment at kPageSize for large sizes. 8 9 alignment = kPageSize ; 10 11 } else if (size >= 128) { 12 13 // Space wasted due to alignment is at most 1/8, i.e., 12.5%. 14 15 alignment = (1 << LgFloor (size)) / 8; 16 17 } else if (size >= kMinAlign) { 18 19 // We need an alignment of at least 16 bytes to satisfy 20 21 // requirements for some SSE types. 22 23 alignment = kMinAlign ; 24 25 } 26 27 // Maximum alignment allowed is page size alignment. 28 29 if ( alignment > kPageSize ) { 30 31 alignment = kPageSize ; 32 33 } 34 35 CHECK_CONDITION( size < kMinAlign || alignment >= kMinAlign); 36 37 CHECK_CONDITION(( alignment & (alignment - 1)) == 0); 38 39 return alignment; 40 41 } 42

     

    LgFloor是个二分法求数值二进制最高位是哪一位的函数。对齐方式可以简化成如下的公式 :

    Image1_thumb1

    按照这样的公式 class_to_size_[1] = 8, class_to_size_[2] = 16, class_to_size_[3] = 32 ...

     

    class_array_的初始化在class_to_size_之后:

    1 // Initialize the mapping arrays 2 3 int next_size = 0; 4 5 for ( int c = 1; c < kNumClasses; c ++) { 6 7 const int max_size_in_class = class_to_size_[c ]; 8 9 for (int s = next_size; s <= max_size_in_class; s += kAlignment ) { 10 11 class_array_[ClassIndex (s)] = c; 12 13 } 14 15 next_size = max_size_in_class + kAlignment; 16 17 } 18

    总的来说就是 ClassIndex一般按照8字节对齐,结果class_to_size_一般按照16字节对齐,class_array_就是去让他们建立对应关系。

    以一个具体例子来说明这个映射关系,比如应用程序申请malloc(25)字节时,tcmalloc实际会给分配多少内存:

         ClassIndex                         class_array_       class_to_size_

    25 ----------------> (25+7)/8=4 ------------------->  3 -------------------> 32

    结果是32字节的内存。

    class_to_pages_ 和num_objects_to_move_

    SizeMap中还有两个map:class_to_pages_ , num_objects_to_move_ 。

    class_to_pages_用在central free list中,表示该size class每一次从 page heap中分配的内存页数,初始化也在SzieMap::Init中:

    1 do { 2 3 psize += kPageSize; 4 5 // Allocate enough pages so leftover is less than 1/8 of total. 6 7 // This bounds wasted space to at most 12.5%. 8 9 while ((psize % size) > (psize >> 3)) { 10 11 psize += kPageSize; 12 13 } 14 15 // Continue to add pages until there are at least as many objects in 16 17 // the span as are needed when moving objects from the central 18 19 // freelists and spans to the thread caches. 20 21 } while ((psize / size) < (blocks_to_move)); 22

    该初始化大小受两个条件决定:

    1)必须小于blocks_to_move(既num_objects_to_move_,表示每次分配内存分配多少个object);

    2)  使得分配出页内存若被划分出一个个object内存,剩余的内存空间不超过该size的1/8的约束,也就是浪费的空间要小于 size/8;

    总结

    SizeMap把tcmalloc所有和内存size有关的map收集封装统一管理,可以通过调整SizeMap来微调分配行为。问题是为什么把要申请的size先按照8字节对齐映射,然后又按照16字节对齐映射,最后再映射两个表?我的一开始想法是把src size直接按照16字节映射,即:

    src size         index                 dst size

    0                     0                       0

    1                     1                       16

    2                     1                       16

    n                     (n+15)/16          (n+15)/16 *16

    这样实现起来更简单直观,也是可以达到目的。可能tcmalloc有更深层的原因我没发现。

  • 相关阅读:
    win10下 Ubuntu 18.04 LTS 的安装及 rlt8821ce网卡驱动的安装
    网络:tcp/ip
    数据结构分类
    面向对象的solid原则
    mysql数据库的操作
    锁机制
    设计模式在项目中的应用
    aop动态代理底层实现模拟
    变量在内存的分配_复习
    java内部类及类加载顺序
  • 原文地址:https://www.cnblogs.com/persistentsnail/p/3446495.html
Copyright © 2011-2022 走看看