zoukankan      html  css  js  c++  java
  • Linux内存管理之slab分配器

    slab分配器是什么?
    参考:http://blog.csdn.net/vanbreaker/article/details/7664296
    slab分配器是Linux内存管理中非常重要和复杂的一部分,其工作是针对一些经常分配并释放的对象,如进程描述符等,这些对象的大小一般比较小,如果直接采用伙伴系统来进行分配和释放,不仅会造成大量的内碎片,而且处理速度也太慢。而slab分配器是基于对象进行管理的,相同类型的对象归为一类(如进程描述符就是一类),每当要申请这样一个对象,slab分配器就从一个slab列表中分配一个这样大小的单元出去,而当要释放时,将其重新保存在该列表中,而不是直接返回给伙伴系统。slab分配对象时,会使用最近释放的对象内存块,因此其驻留在CPU高速缓存的概率较高。

    slab分配器主要解决了什么问题?
    参考:《Linux Kernel Development, 3rd Edition》
    Allocating and freeing data structures is one of the most common operations inside any kernel.To facilitate frequent allocations and eallocations of data, programmers often introduce free lists.
    A free list contains a block of available, already allocated, data structures. When code requires a new instance of a data structure, it can grab one of the structures off the free list rather than allocate the sufficient amount of memory and set it up for the data structure. Later, when the data structure is no longer needed, it is returned to the free list instead of deallocated. In this sense, the free list acts as an object cache, caching a requently used type of object.

    One of the main problems with free lists in the kernel is that there exists no global control.When available memory is low, there is no way for the kernel to communicate to every free list that it should shrink the sizes of its cache to free up memory.The kernel has no understanding of the random free lists at all.
    To remedy this, and to consolidate code, the Linux kernel provides the slab layer (also called the slab allocator).The slab layer acts as a generic data structure-caching layer.

    简言之:需要高速缓存,而高速缓存需要被全局控制;为了解决内存碎片问题。

    区别于伙伴系统:Linux采用伙伴系统解决外部碎片的问题,采用slab解决内部碎片的问题。

    slab层是如何设计的?
    简单理解:高速缓存>slab>对象。
    一种高速缓存对应一种对象。每个高速缓存可以由多个slab组成。slab由一个或多个物理连续页组成,一般为单页。对象可以是task_struct或inode等,同样高速缓存也需要一个结构来描述。
    高速缓存、slab及对象之间的关系

    slab层中关键数据结构有哪些,这些数据结构之间的关系是怎样的?
    参考:linux-2.6.26mmslab.c。这里只给出源码中我们关注的部分,请注意源码中的注释。

    /*
     * struct slab
     *
     * Manages the objs in a slab. Placed either at the beginning of mem allocated
     * for a slab, or allocated from an general cache.
     * Slabs are chained into three list: fully used, partial, fully free slabs.
     */
    struct slab {
    	struct list_head list;
    	unsigned long colouroff;
    	void *s_mem;		/* including colour offset */
    	unsigned int inuse;	/* num of objs active in slab */
    	kmem_bufctl_t free;
    	unsigned short nodeid;
    };
    
    //------------------------------------------------------
    /*
     * struct array_cache
     *
     * Purpose:
     * - LIFO ordering, to hand out cache-warm objects from _alloc
     * - reduce the number of linked list operations
     * - reduce spinlock operations
     *
     * The limit is stored in the per-cpu structure to reduce the data cache
     * footprint.
     *
     */
    struct array_cache {
    	unsigned int avail;
    	unsigned int limit;
    	unsigned int batchcount;
    	unsigned int touched;
    };
    
    //------------------------------------------------------
    /*
     * The slab lists for all objects.
     */
    struct kmem_list3 {
    	struct list_head slabs_partial;	/* partial list first, better asm code */
    	struct list_head slabs_full;
    	struct list_head slabs_free;
    	unsigned long free_objects;
    	unsigned int free_limit;
    	unsigned int colour_next;	/* Per-node cache coloring */
    	spinlock_t list_lock;
    	struct array_cache *shared;	/* shared per node */
    	struct array_cache **alien;	/* on other nodes */
    	unsigned long next_reap;	/* updated without locking */
    	int free_touched;		/* updated without locking */
    };
    
    //------------------------------------------------------
    /*
     * struct kmem_cache
     *
     * manages a cache.
     */
    
    struct kmem_cache {
    /* 1) per-cpu data, touched during every alloc/free */
    	struct array_cache *array[NR_CPUS];
    /* 2) Cache tunables. Protected by cache_chain_mutex */
    	unsigned int batchcount;
    	unsigned int limit;
    	unsigned int shared;
    
    	unsigned int buffer_size;
    	u32 reciprocal_buffer_size;
    	
    	******
    	
    	/*
    	 * We put nodelists[] at the end of kmem_cache, because we want to size
    	 * this array to nr_node_ids slots instead of MAX_NUMNODES
    	 * (see kmem_cache_init())
    	 * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
    	 * is statically defined, so we reserve the max number of nodes.
    	 */
    
    	******
    };
    

    每个高速缓存都由kmem_cache描述。
    kmem_cache中array_cache结构表明在SMP系统中每个CPU都拥有一种对象的一个 高速缓存,例如CPU1-CPU4各自拥有一个管理inode对象的高速缓存,kmem_cache->nodelists[x]->shared表明高速缓存是可以共享的。
    kmem_cache中kmem_list3结构包含三个链表,slabs_full和slabs_partial和slabs_empty。
    这些链表包含高速缓存的所有slab。

    slab层是如何工作的?
    1、初始化
    2、创建缓存
    参考:http://guojing.me/linux-kernel-architecture/posts/create-slab/

    3、分配对象
    参考:http://blog.csdn.net/vanbreaker/article/details/7671211
    1)根据所需对象的名字(如task_struct)查看高速缓存中是否有空闲对象。
    2)如果本地高速缓存中没有对象,则从kmem_list3中的slab链表中寻找空闲对象并填充到本地高速缓存再分配。
    3)如果所有的slab中都没有空闲对象了,那么就要创建新的slab,再分配 。

    4、分配器的接口
    创建高速缓存

    撤销高速缓存

    获取对象
    void * kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
    释放对象
    void kmem_cache_free(struct kmem_cache *cachep, void *objp)

    task_struct对象是如何分配的?
    参考:linux-2.6.26kernelfork.c

    内核用一个全局变量存放指向task_struct高速缓存的指针,源码如下

    #ifndef __HAVE_ARCH_TASK_STRUCT_ALLOCATOR
    # define alloc_task_struct()	kmem_cache_alloc(task_struct_cachep, GFP_KERNEL)
    # define free_task_struct(tsk)	kmem_cache_free(task_struct_cachep, (tsk))
    static struct kmem_cache *task_struct_cachep;
    #endif
    

    内核初始化期间,fork_init()中会创建高速缓存,源码如下

    #ifndef __HAVE_ARCH_TASK_STRUCT_ALLOCATOR
    #ifndef ARCH_MIN_TASKALIGN
    #define ARCH_MIN_TASKALIGN	L1_CACHE_BYTES
    #endif
    	/* create a slab on which task_structs can be allocated */
    	task_struct_cachep =
    		kmem_cache_create("task_struct", sizeof(struct task_struct),
    			ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL);
    #endif
    

    每当进程调用fork()时,一定会创建一个新的进程描述符,实现过程:do_fork(), copy_process(), dup_task_struct(), alloc_task_struct()。前一个函数调用后一个,最后alloc_task_struct()又由上面给出的宏替换为kmem_cache_alloc(task_struct_cachep, GFP_KERNEL),最后完成进程描述符的创建。
    在创建完成后,如果没有子进程在等待的话,它的进程描述符就会被释放,实现过程:dup_task_struct(), free_task_struct()。最后free_task_struct()又由上面给出的宏替换为kmem_cache_free(task_struct_cachep, tsk)。
    源码如下,只给出了我们关注的部分

    /*
     *  Ok, this is the main fork-routine.
     *
     * It copies the process, and if successful kick-starts
     * it and waits for it to finish using the VM if required.
     */
    long do_fork(unsigned long clone_flags,
    	      unsigned long stack_start,
    	      struct pt_regs *regs,
    	      unsigned long stack_size,
    	      int __user *parent_tidptr,
    	      int __user *child_tidptr)
    {
    	struct task_struct *p;
    	int trace = 0;
    	long nr;
    
    	/*
    	 * We hope to recycle these flags after 2.6.26
    	 */
    	if (unlikely(clone_flags & CLONE_STOPPED)) {
    		static int __read_mostly count = 100;
    
    		if (count > 0 && printk_ratelimit()) {
    			char comm[TASK_COMM_LEN];
    
    			count--;
    			printk(KERN_INFO "fork(): process `%s' used deprecated "
    					"clone flags 0x%lx
    ",
    				get_task_comm(comm, current),
    				clone_flags & CLONE_STOPPED);
    		}
    	}
    
    	if (unlikely(current->ptrace)) {
    		trace = fork_traceflag (clone_flags);
    		if (trace)
    			clone_flags |= CLONE_PTRACE;
    	}
    
    	p = copy_process(clone_flags, stack_start, regs, stack_size,
    			child_tidptr, NULL);
    	/*
    	 * Do this prior waking up the new thread - the thread pointer
    	 * might get invalid after that point, if the thread exits quickly.
    	 */
    	if (!IS_ERR(p)) {
    		struct completion vfork;
    
    		nr = task_pid_vnr(p);
    
    		if (clone_flags & CLONE_PARENT_SETTID)
    			put_user(nr, parent_tidptr);
    
    		if (clone_flags & CLONE_VFORK) {
    			p->vfork_done = &vfork;
    			init_completion(&vfork);
    		}
    
    		if ((p->ptrace & PT_PTRACED) || (clone_flags & CLONE_STOPPED)) {
    			/*
    			 * We'll start up with an immediate SIGSTOP.
    			 */
    			sigaddset(&p->pending.signal, SIGSTOP);
    			set_tsk_thread_flag(p, TIF_SIGPENDING);
    		}
    
    		if (!(clone_flags & CLONE_STOPPED))
    			wake_up_new_task(p, clone_flags);
    		else
    			__set_task_state(p, TASK_STOPPED);
    
    		if (unlikely (trace)) {
    			current->ptrace_message = nr;
    			ptrace_notify ((trace << 8) | SIGTRAP);
    		}
    
    		if (clone_flags & CLONE_VFORK) {
    			freezer_do_not_count();
    			wait_for_completion(&vfork);
    			freezer_count();
    			if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE)) {
    				current->ptrace_message = nr;
    				ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
    			}
    		}
    	} else {
    		nr = PTR_ERR(p);
    	}
    	return nr;
    }
    
    //------------------------------------------------------------
    /*
     * This creates a new process as a copy of the old one,
     * but does not actually start it yet.
     *
     * It copies the registers, and all the appropriate
     * parts of the process environment (as per the clone
     * flags). The actual kick-off is left to the caller.
     */
    static struct task_struct *copy_process(unsigned long clone_flags,
    					unsigned long stack_start,
    					struct pt_regs *regs,
    					unsigned long stack_size,
    					int __user *child_tidptr,
    					struct pid *pid)
    {
    	int retval;
    	struct task_struct *p;
    	int cgroup_callbacks_done = 0;
    
    	if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
    		return ERR_PTR(-EINVAL);
    
    	/*
    	 * Thread groups must share signals as well, and detached threads
    	 * can only be started up within the thread group.
    	 */
    	if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
    		return ERR_PTR(-EINVAL);
    
    	/*
    	 * Shared signal handlers imply shared VM. By way of the above,
    	 * thread groups also imply shared VM. Blocking this case allows
    	 * for various simplifications in other code.
    	 */
    	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
    		return ERR_PTR(-EINVAL);
    
    	retval = security_task_create(clone_flags);
    	if (retval)
    		goto fork_out;
    
    	retval = -ENOMEM;
    	p = dup_task_struct(current);
    	if (!p)
    		goto fork_out;
    
    	rt_mutex_init_task(p);
    	
    	******
    fork_out:
    	return ERR_PTR(retval);
    }
    
    //------------------------------------------------------------
    static struct task_struct *dup_task_struct(struct task_struct *orig)
    {
    	struct task_struct *tsk;
    	struct thread_info *ti;
    	int err;
    
    	prepare_to_copy(orig);
    
    	tsk = alloc_task_struct();
    	if (!tsk)
    		return NULL;
    
    	ti = alloc_thread_info(tsk);
    	if (!ti) {
    		free_task_struct(tsk);
    		return NULL;
    	}
    
     	err = arch_dup_task_struct(tsk, orig);
    	if (err)
    		goto out;
    
    	tsk->stack = ti;
    
    	err = prop_local_init_single(&tsk->dirties);
    	if (err)
    		goto out;
    
    	setup_thread_stack(tsk, orig);
    
    #ifdef CONFIG_CC_STACKPROTECTOR
    	tsk->stack_canary = get_random_int();
    #endif
    
    	/* One for us, one for whoever does the "release_task()" (usually parent) */
    	atomic_set(&tsk->usage,2);
    	atomic_set(&tsk->fs_excl, 0);
    #ifdef CONFIG_BLK_DEV_IO_TRACE
    	tsk->btrace_seq = 0;
    #endif
    	tsk->splice_pipe = NULL;
    	return tsk;
    
    out:
    	free_thread_info(ti);
    	free_task_struct(tsk);
    	return NULL;
    }
    
  • 相关阅读:
    第二章 Google guava cache源码解析1--构建缓存器
    第十一章 AtomicInteger源码解析
    JVM虚拟机(五):JDK8内存模型—消失的PermGen
    JVM虚拟机(四):JVM 垃圾回收机制概念及其算法
    Java中RunTime类介绍
    JVM虚拟机(三):参数配置
    ZooKeeperEclipse 插件
    zookeeper(五):Zookeeper中的Access Control(ACL)
    zookeeper(六):Zookeeper客户端Curator的API使用详解
    Java并发编程(九):拓展
  • 原文地址:https://www.cnblogs.com/shuaihanhungry/p/5762558.html
Copyright © 2011-2022 走看看