zoukankan      html  css  js  c++  java
  • kernel memory code learn

    mem alloc

    page

    Noticeble: 
    1. there are two kind of page: virtual page, physical page.
    2. the page struct is abstract of physical memory page, but not virtual memory!
    
    
    struct page {
    	unsigned long flags;		//page's status,eg: is dirty page or not?
    	atomic_t 		_count;		//the page referred count 
    	atomic_t 		_mapcount;
    	...
    	void			*virtual;		//virtual field means the physical page's virltual address.
    }
    
    related function: 
    1. __count should be controled by page_count(), if page_count() return 0, means the page is not used!
    
    

    zone

    Normally, there are zone like: ZONE_DMA, ZONE_NORMAL, ZONE_HIGHMMEM, ZONE_DMA32
     
    x86_64 only 2 zone: ZONE_NORMAL, ZONE_DMA
    
    
    struct zone {
    	unsigned long 	watermark[NR_WMARK];
    	
    	...
    	
    	const char *name; // This feild controled by alloc_pages in mm/page_alloc.c
    	
    }
    

    alloc pages

    struct page * alloc_pages(gfp_t gfp_mask, unsigned int order);
    unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
    
    struct page * alloc_page(gfp_t gfp_mask);
    unsigned long __get_free_page(gfp_t gfp_mask, unsigned int order);
    
    
    void * page_address(struct page *page);
    
    
    // Get a page fill out by 0.
    unsigned long get_zeroed_page(unsigned int gfp_mask);
    
    

    free pages

    void __free_pages(struct page *page, unsigned int order);
    void free_pages(unsigned long addr, unsigned int order);
    void free_page(unsigned long addr);
    

    e.g

    Get 8 pages

    unsigned long page;
    
    page = __get_free_pages(GFP_KERNEL, 3);
    if (!page){
    	return -ENOMEM;
    }
    
    
    free_pages(page, 3);
    
    

    kmalloc()

    If you want apply one page or two and more, maybe get_free_page() is more suiteble;

    kmalloc is suiteble for apply bytes level size

    struct dog *p;
    
    p = kmalloc(sizeof(struct dog), GFP_KERNEL);
    
    

    kfree()

    # include <linux/slab.h>
    
    void kfree(const void *str);
    
    
    char *buf; 
    
    buf = kmalloc(BUF_SIZE, GFP_ATOMIC);
    if (!buf) {
    	kfree(buf);
    }
    

    gfp_mask mark

    1. 行为修饰符
    2. 区修饰符
    __GFP_DMA
    __GFP_DMA32	
    __GFP_HIGHMEM
    In fact only these marks, there is no __GFP_NORMAL mark, 
    because default will use normal zone area, normally !
    
    
    不能给_get_free_page & kmalloc() 指定ZONE_HIGHMEM, 因为这两个函数返回的都是逻辑地址,而不是page 结构。
    
    只有alloc_pages()才能分配高端内存, 实际上大部分使用情况下,我们不需要指定zone的描述符,normal足以。
    

    vmalloc()

    void * vmalloc(unsigned long size);
    void vfree(const void *addr);
    

    vmalloc(): the area applied is virtual address, and must be continuous, the
    physical address don't need be continuous!

    kmalloc(): will make sure the physical address applied must be continuous, so, the virtual address must be continuous too naturelly!

    Normally, hardware need physical memory address applied is continuous, because hardware is beside kernel's memory management, Hardware don't
    know what is virtual address.

    More kmalloc() but not vmalloc():

    Athough, kmalloc() can apply continuous physical memory address, but it have many advantages, just like low performance consumption, So, we'd like
    to use kmalloc() normally, but use vmalloc() only under extremely conditions

    slab

    There are two mainly struct at slab subsystem:

    struct kmem_cache;
    struct slabinfo;
    

    There are three kind of status for each slab:

    1. fill
    2. partial
    3. empty
    kmem_cache corresponed to a type of collected struct like "struct inode", 
    there are lots of small "struct" in kernel need to be "alloc" and "free" 
    frequently, So, Sun corporation designed "SLAB" conception to solve this 
    problem, Acttually, it is cache, alloc memory area pre, and use it like 
    a poll.
    
    But we always misunderstand the conception between "struct kmem_cache" 
    and "struct slabinfo", We can introduce a new conception "A High Cache" 
    which is corresponding a "struct kmem_cache".
    
    
    "struct kmem_cache" is corresponding to ONE type of "struct".
    
    "struct slabinfo" is the subset of kmem_cache, each slab struct is a 
    set of memory address(maybe one or more pages)
    
    
    struct kmem_cache {
        unsigned int object_size;/* The original size of the object */
        unsigned int size;  /* The aligned/padded/added on size  */
        unsigned int align; /* Alignment as calculated */
        slab_flags_t flags; /* Active flags on the slab */
        const char *name;   /* Slab name for sysfs */
        int refcount;       /* Use counter */
        void (*ctor)(void *);   /* Called on object slot creation */
        struct list_head list;  /* List of all slab caches on the system */
    };
    
    

    How to create A High Cache?

    struct kmem_cache * kmem_cache_create(const char *name, 
    											 size_t size,
    											 size_t align,
    											 unsigned long flags, 
    											 void (*otor)(void *));
    

    How to destroy A High Cache?

    int kmem_cache_destroy(struct kmem_cache *cachep);
    
    * if you want to destory this High Cache, you must make sure all slab
      is empty
    * return 0 means destroy success!
    * alway used at a module be unset
    

    How to alloc a objet from A High Cache?

    There are more than one slab struct in "A High Cache". So, If we want
    to alloc a address for a "small object struct", the condition is there
    should have "not empty status slab" in this "A High Cache"!

    void * kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags);
    
    This will mark the object objp as status "unused".
    
    void kmeme_cache_free(struct kmem_cache *cachep, void *objp;
    

    A SLAB EXAMPLE

    Let's analyse a example , the struct "task_struct".

    Well, this is a very famous struct, right?

    struct kmem_cache *task_struct_cachep;  // This is kmem_cache name rule, 
    											task_struct_cachep a variable point 
    											to a struct kmem_cache.
    
    task_struct_cachep = kmem_cache_create("task_struct", 
    											sizeof(struct task_truct),
    											ARCH_MIN_TASKALIGN,
    											SLAB_PANIC | SLAB_NOTRACK,
    											NULL);
    												
    As bellow, we can see kmem_cache_create function's return value is the 
    kmem_cache struct. 
    
    So, we can say, we created a "A High Cache" named as "task_struct_cachep".
    and the struct type will be stored in "task_struct_cachep" is "struct task_struct".
    
    when excute fork() function, we must be create a new struct 
    "struct task_struct", the mainly work will be done at do_fork();
    
    struct task_struct *tsk;
    
    tsk = kmem_cache_alloc(task_struct_cachep, GFP_KERNEL);
    if (!tsk) {
    	return NULL;
    }
    
    ...
    ...
    
    kmem_cache_free(task_struct_cachep, tsk);  //free object tsk from task_struct_cachep
    
    
    int err;
    
    err = kmem_cache_destroy(task_struct_cachep);
    if (err) {
    	...
    }
    
    
    

    How kernel abstract memory

    A global struct page array: mem_map[]

    If your memory is 76G, you will have page count: 7610241024/4k = 19922944 pages, So, the mem_map[] array's size is 19922944.

    NODE节点

    1. In NUMA structure. the NODE is abstract as struct pglist_data, usually use it's typedef name called: pg_data_t.
    2. the list pgdat_list connect with pg_data_t->node_next.
    typedef struct pglist_data {
        struct zone node_zones[MAX_NR_ZONES];
        struct zonelist node_zonelists[MAX_ZONELISTS];
        int nr_zones;
    #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
        struct page *node_mem_map;
    #ifdef CONFIG_PAGE_EXTENSION
        struct page_ext *node_page_ext;
    #endif
    #endif
    #ifndef CONFIG_NO_BOOTMEM
        struct bootmem_data *bdata;
    #endif
    #ifdef CONFIG_MEMORY_HOTPLUG
        /*
         * Must be held any time you expect node_start_pfn, node_present_pages
         * or node_spanned_pages stay constant.  Holding this will also
         * guarantee that any pfn_valid() stays that way.
         *
         * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
         * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG.
         *
         * Nests above zone->lock and zone->span_seqlock
         */
        spinlock_t node_size_lock;
    #endif
        unsigned long node_start_pfn;
        unsigned long node_present_pages; /* total number of physical pages */
        unsigned long node_spanned_pages; /* total size of physical page
                             range, including holes */
        int node_id;
        wait_queue_head_t kswapd_wait;
        wait_queue_head_t pfmemalloc_wait;
        struct task_struct *kswapd; /* Protected by
                           mem_hotplug_begin/end() */
        int kswapd_order;
        enum zone_type kswapd_classzone_idx;
        
        int kswapd_failures;        /* Number of 'reclaimed == 0' runs */
    
    #ifdef CONFIG_COMPACTION
        int kcompactd_max_order;
        enum zone_type kcompactd_classzone_idx;
        wait_queue_head_t kcompactd_wait;
        struct task_struct *kcompactd;
    #endif
    #ifdef CONFIG_NUMA_BALANCING
        /* Lock serializing the migrate rate limiting window */
        spinlock_t numabalancing_migrate_lock;
    
        /* Rate limiting time interval */
        unsigned long numabalancing_migrate_next_window;
    
        /* Number of pages migrated during the rate limiting time interval */
        unsigned long numabalancing_migrate_nr_pages;
    #endif
        /*
         * This is a per-node reserve of pages that are not available
         * to userspace allocations.
         */
        unsigned long       totalreserve_pages;
    
    #ifdef CONFIG_NUMA
        /*
         * zone reclaim becomes active if more unmapped pages exist.
         */
        unsigned long       min_unmapped_pages;
        unsigned long       min_slab_pages;
    #endif /* CONFIG_NUMA */
    
        /* Write-intensive fields used by page reclaim */
        ZONE_PADDING(_pad1_)
        spinlock_t      lru_lock;
        } pg_data_t;
    
    

    mem reclaim

    mem writeback

    1. 内存缓存
    2. 内存管理

    其中内存缓存机制中,重要的结构体:

    struct page {
    	unsigned long flags;
    	union {
    		struct address_space *mapping;
    	}
    	union {
    		pgoff_t index;
    	}
    	..
    }
    
    若页面Cache中页的所有者是文件,address_space对象就嵌入在VFS inode对象中的
    i_data字段中。
    
    i_mapping字段总是指向含有inode数据的页所有者的address_space对象,
    address_space对象中的host字段指向其所有者的inode对象。
    
    struct address_space {
    	struct inode *host;
    	struct radix_tree_root page_tree;
    	const struct address_space_operations *a_ops;
    	..
    	..
    	
    }
    
    struct address_space_operations {
    	
    }
    
    struct inode {
    	struct address_space *i_mapping;
    	struct address_space *i_data;
    }
    

    http://oenhan.com/linux-cache-writeback

  • 相关阅读:
    跟踪创建类的个数
    动手动脑3
    动手动脑:随机数发生器和函数重载
    统计英语文章中单词
    动手动脑(1)
    原码、反码、补码
    java测试ATM自助操作系统
    深入浅出 TCP/IP 协议栈
    十大经典排序算法(动图演示)
    深入浅出 Viewport 设计原理
  • 原文地址:https://www.cnblogs.com/muahao/p/8569215.html
Copyright © 2011-2022 走看看