zoukankan      html  css  js  c++  java
  • 操作系统中的虚拟内存技术及其实现代码

    虚拟内存是现代操作系统普遍使用的一种技术。

    虚拟内存的基本思想是,每个进程有用独立的逻辑地址空间,内存被分为大小相等的多个块,称为(Page)。每个页都是一段连续的地址。对于进程来看,逻辑上貌似有很多内存空间,其中一部分对应物理内存上的一块(称为页框 page frame,通常页和页框大小相等),还有一些没加载在内存中的对应在硬盘上。通过引入进程的逻辑地址,把进程地址空间与实际存储空间分离,增加存储管理的灵活性。

    地址空间和存储空间两个基本概念的定义如下:

     

    地址空间:将源程序经过编译后得到的目标程序,存在于它所限定的地址范围内,这个范围称为地址空间。地址空间是逻辑地址的集合。

     

    存储空间:指主存中一系列存储信息的物理单元的集合,这些单元的编号称为物理地址存储空间是物理地址的集合。

    由此衍生出的管理方式有三种:
    页式存储管理、段式存储管理和段页式存储管理。这里主要介绍页式存储。

    在页式系统中进程建立时,操作系统为进程中所有的页分配页框。当进程撤销时收回所有分配给它的页框。在程序的运行期间,如果允许进程动态地申请空间,操作系统还要为进程申请的空间分配物理页框。操作系统为了完成这些功能,必须记录系统内存中实际的页框使用情况。操作系统还要在进程切换时,正确地切换两个不同的进程地址空间到物理内存空间的映射。为了理解操作系统如何完成这些需求,我们先理解页表技术。先看张图,转载自51CTO:

    页表中的条目被称为页表项(page table entry),一个页表项负责记录一段虚拟地址到物理地址的映射关系。

    既然页表是存储在内存中的,那么程序每次完成一次内存读取时都至少会访问内存两次,相比于不使用MMU(MMU是Memory Management Unit的缩写,它代表集成在CPU内部的一个硬件逻辑单元,主要作用是给CPU提供从虚拟地址向物理地址转换的功能,从硬件上给软件提供一种内存保护的机制)时的一次内存访问,效率被大大降低了,如果所使用的内存的性能比较差的话,这种效率的降低将会更明显。因此,如何在发挥MMU优势的同时使系统消耗尽量减小,就成为了一个亟待解决的问题。

    于是,TLB产生了。TLB是什么呢?我们叫它转换旁路缓冲器,它实际上是MMU中临时存放转换数据的一组重定位寄存器。既然TLB本质上是一组寄存器,那么不难理解,相比于访问内存中的页表,访问TLB的速度要快很多。因此如果页表的内容全部存放于TLB中,就可以解决访问效率的问题了。

    然而,由于制造成本等诸多限制,所有页表都存储在TLB中几乎是不可能的。这样一来,我们只能通过在有限容量的TLB中存储一部分最常用的页表,从而在一定程度上提高MMU的工作效率。

    这一方法能够产生效果的理论依据叫做存储器访问的局部性原理。它的意思是说,程序在执行过程中访问与当前位置临近的代码的概率更高一些。因此,从理论上我们可以说,TLB中存储了当前时间段需要使用的大多数页表项,所以可以在很大程度上提高MMU的运行效率。

    我们这里所用的是二级页表的技术,何为二级页表,即是MMU采用二级查表的方法,即首先由虚拟地址索引出第一张表的某一段内容,然后再根据这段内容搜索第二张表,最后才能确定物理地址。这里的第一张表,我们叫它一级页表,第二张表被称为是二级页表。采用二级查表法的主要目的是减小页表自身占据的内存空间,但缺点是进一步降低了内存的寻址效率。

    好了,前情介绍完毕,下面上干货,用哈佛大学开发的用于教学的OS161来实现VM,OS161基于MIP-I hardware

    代码位于github上:https://github.com/tian-jiang/OS161-VirtualMemory

    首先看一段代码,kern/arch/mips/include/vm.h,物理内存的分配定义在此

    /*
     * MIPS-I hardwired memory layout:
     *    0xc0000000 - 0xffffffff   kseg2 (kernel, tlb-mapped)
     *    0xa0000000 - 0xbfffffff   kseg1 (kernel, unmapped, uncached)
     *    0x80000000 - 0x9fffffff   kseg0 (kernel, unmapped, cached)
     *    0x00000000 - 0x7fffffff   kuseg (user, tlb-mapped)
     *
     * (mips32 is a little different)
     */
    
    #define MIPS_KUSEG  0x00000000
    #define MIPS_KSEG0  0x80000000
    #define MIPS_KSEG1  0xa0000000
    #define MIPS_KSEG2  0xc0000000
    
    

    内存的分配用图表示如下

    这张图展示了在OS161中物理内存的分配. 

    让我们从头开始:kern/startup/man.c

    1     /* Early initialization. */
    2     ram_bootstrap();
    3         .......
    4 
    5     /* Late phase of initialization. */
    6     vm_bootstrap();
    7         ........

    在操作系统启动的时候,调用raw_bootstrap()以及vm_bootstrap()来启动vm管理模块。那么这两个函数是在哪里定义和使用的呢,我们接着看下面的代码。

    kern/include/vm.h和kern/arch/mips/include/vm.h

    /* Initialization function */
    void vm_bootstrap(void);
    ......

      /* Allocate/free kernel heap pages (called by kmalloc/kfree) */

      void frametable_bootstrap(void);

    /*
     * Interface to the low-level module that looks after the amount of
     * physical memory we have.
     *
     * ram_getsize returns the lowest valid physical address, and one past
     * the highest valid physical address. (Both are page-aligned.) This
     * is the memory that is available for use during operation, and
     * excludes the memory the kernel is loaded into and memory that is
     * grabbed in the very early stages of bootup.
     *
     * ram_stealmem can be used before ram_getsize is called to allocate
     * memory that cannot be freed later. This is intended for use early
     * in bootup before VM initialization is complete.
     */
    
    void ram_bootstrap(void);
    paddr_t ram_stealmem(unsigned long npages);
    void ram_getsize(paddr_t *lo, paddr_t *hi);

    这两个function是定义在这里的,那么这两个function又是干什么事情的呢

    kern/arch/mips/vm/ram.c, kern/arch/mips/vm/vm.c, kern/vm/frametable.c

    vaddr_t firstfree;   /* first free virtual address; set by start.S */
    
    static paddr_t firstpaddr;  /* address of first free physical page */
    static paddr_t lastpaddr;   /* one past end of last free physical page */
    
    /*
     * Called very early in system boot to figure out how much physical
     * RAM is available.
     */
    void
    ram_bootstrap(void)
    {
        size_t ramsize;
        
        /* Get size of RAM. */
        ramsize = mainbus_ramsize();
    
        /*
         * This is the same as the last physical address, as long as
         * we have less than 508 megabytes of memory. If we had more,
         * various annoying properties of the MIPS architecture would
         * force the RAM to be discontiguous. This is not a case we 
         * are going to worry about.
         */
        if (ramsize > 508*1024*1024) {
            ramsize = 508*1024*1024;
        }
    
        lastpaddr = ramsize;
    
        /* 
         * Get first free virtual address from where start.S saved it.
         * Convert to physical address.
         */
        firstpaddr = firstfree - MIPS_KSEG0;
    
        kprintf("%uk physical memory available
    ", 
            (lastpaddr-firstpaddr)/1024);
    }
    /*
     * Initialise the frame table
     */
    void
    vm_bootstrap(void)
    {
        frametable_bootstrap();
    }
    /*
     * Make variables static to prevent it from other file's accessing
     */
    static struct frame_table_entry *frame_table;
    static paddr_t frametop, freeframe;
    
    /*
     * initialise frame table
     */
    void
    frametable_bootstrap(void)
    {
        struct frame_table_entry *p;
        paddr_t firsta, lasta, paddr;
        unsigned long framenum, entry_num, frame_table_size, i;
        
        // get the useable range of physical memory
        ram_getsize(&firsta, &lasta);
        KASSERT((firsta & PAGE_FRAME) == firsta);
        KASSERT((lasta & PAGE_FRAME) == lasta);
        
        framenum = (lasta - firsta) / PAGE_SIZE;
        
        // calculate the size of the whole framemap
        frame_table_size = framenum * sizeof(struct frame_table_entry);
        frame_table_size = ROUNDUP(frame_table_size, PAGE_SIZE);
        entry_num = frame_table_size / PAGE_SIZE;
        KASSERT((frame_table_size & PAGE_FRAME) == frame_table_size);
        
        frametop = firsta;
        freeframe = firsta + frame_table_size;
        
        if (freeframe >= lasta) {
            // This is impossible for most of the time
            panic("vm: framemap consume physical memory?
    ");
        }
        
        // keep the frame state in the top of the useable range of physical memory
        // the free frame page address started from the end of the frame map
        frame_table = (struct frame_table_entry *) PADDR_TO_KVADDR(firsta);
        
        // Initialise the frame list, each entry corrsponding to a frame,
        // and each entry stores the address of the next free frame.
        // If the next frame address of this entry equals zero, means this current frame is allocated
        p = frame_table;
        for (i = 0; i < framenum-1; i++) {
            if (i < entry_num) {
                p->next_freeframe = 0;
                p += 1;
                continue;
            }
            paddr = frametop + (i+1) * PAGE_SIZE;
            p->next_freeframe = paddr;
            p += 1;
        }
    }
    kern/include/vm.h
    struct
    frame_table_entry { // address of next free frame size_t next_freeframe; };

    raw_bootstrap是系统初始化时用来查看有多少物理内存可以使用的。而vm_bootstrap只是简单的调用了frametable_bootstrap(),而frametable_bootstrap()则是将能用的物理内存分页,每页大小为4K,然后保存一个记录空白页的linked list在内存中,从free的内存的顶部开始存放,但是在存放之前,先要算出需要多少空间来存放这个frame table。所以代码的前段在计算frame table的大小,后面则是初始化frame table这个linked list。因为初始化的时候都是空的,所以直接指向下一个page的地址即可。

    操作系统的vm初始化到此完毕。那vm是怎么使用的呢,请看下面

    kern/include/vm.h

    /* Fault handling function called by trap code */
    int vm_fault(int faulttype, vaddr_t faultaddress);
    
    vaddr_t alloc_kpages(int npages);
    void free_kpages(vaddr_t addr);

    kern/include/addrspace.h,实现在kern/vm/addrspace.c

    /* 
     * Address space - data structure associated with the virtual memory
     * space of a process.
     *
     * You write this.
     */
    
    /*
     * A linked list which defined to store the information for regions(code, text, bss...)
     */
    struct as_region {
        vaddr_t as_vbase;    /* the started virtual address for one region */
        size_t as_npages;    /* how many pages this region occupied from the vbase */
        unsigned int as_permissions;    /* does this region readable? writable? executable? */
        struct as_region *as_next_region;    /* address of the following region */
    };
    
    struct addrspace {
    #if OPT_DUMBVM
            vaddr_t as_vbase1;
            paddr_t as_pbase1;
            size_t as_npages1;
            vaddr_t as_vbase2;
            paddr_t as_pbase2;
            size_t as_npages2;
            paddr_t as_stackpbase;
    #else
            /* Put stuff here for your VM system */
        struct as_region *as_regions_start;    /* header of the regions linked list */
        vaddr_t as_pagetable;               /* address of the first-level page table */
    #endif
    };
    
    /*
     * The structure of PTE in page table:
     * |        address             |  PTE_VALID      |    PE_W        |    PF_R        |    PF_X
     *  the virtual address of frame | valid indicator | writeable flag | readable flag | executable flag 
     * I don't use structure to represent PTE, just use type vaddr_t, and becuase the last 12 bit is free 
     * for a virtual address of frame, some of they could be used for the flags
     */
    
    /*
     * Functions in addrspace.c:
     *
     *    as_create - create a new empty address space. You need to make 
     *                sure this gets called in all the right places. You
     *                may find you want to change the argument list. May
     *                return NULL on out-of-memory error.
     *
     *    as_copy   - create a new address space that is an exact copy of
     *                an old one. Probably calls as_create to get a new
     *                empty address space and fill it in, but that's up to
     *                you.
     *
     *    as_activate - make the specified address space the one currently
     *                "seen" by the processor. Argument might be NULL, 
     *                meaning "no particular address space".
     *
     *    as_destroy - dispose of an address space. You may need to change
     *                the way this works if implementing user-level threads.
     *
     *    as_define_region - set up a region of memory within the address
     *                space.
     *
     *    as_prepare_load - this is called before actually loading from an
     *                executable into the address space.
     *
     *    as_complete_load - this is called when loading from an executable
     *                is complete.
     *
     *    as_define_stack - set up the stack region in the address space.
     *                (Normally called *after* as_complete_load().) Hands
     *                back the initial stack pointer for the new process.
     *
     *    as_zero_region - zero out a new allocated page.
     *
     *    as_destroy_regions - free all the space allocated for regions storeage.
     */
    
    struct addrspace *as_create(void);
    int               as_copy(struct addrspace *src, struct addrspace **ret);
    void              as_activate(struct addrspace *);
    void              as_destroy(struct addrspace *);
    
    int               as_define_region(struct addrspace *as, 
                                       vaddr_t vaddr, size_t sz,
                                       int readable, 
                                       int writeable,
                                       int executable);
    int               as_prepare_load(struct addrspace *as);
    int               as_complete_load(struct addrspace *as);
    int               as_define_stack(struct addrspace *as, vaddr_t *initstackptr);
    void          as_zero_region(vaddr_t vaddr, unsigned npages);
    void          as_destroy_regions(struct as_region *ar);

    kern/vm/frametable.c

    /*
     * Allocate n pages. 
     * Before frame table initialisation, using ram_stealmem
     */
    static
    paddr_t
    getppages(int npages)
    {
        paddr_t paddr;
        struct frame_table_entry *p;
        int i;
        
        spinlock_acquire(&frametable_lock);
        if (frame_table == 0)
            paddr = ram_stealmem(npages);
        else
        {
            if (npages > 1){
                spinlock_release(&frametable_lock);
                return 0;
            }
            
            // Freeframe equals zero means all the frames have been allocated
            // and there is no frame to use.
            if (freeframe == 0){
                spinlock_release(&frametable_lock);
                return 0;
            }
            
            // Get the current free frame's entry id 
            // and retrieve the next free frame 
            paddr = freeframe;
            i = (freeframe - frametop) / PAGE_SIZE;
            p = frame_table + i;
            
            freeframe = p->next_freeframe;
            p->next_freeframe = 0;
        }
        spinlock_release(&frametable_lock);
        
        return paddr;
    }
    
    /*
     * Allocation function for public accessing
     * Returning virtual address of frame
     */
    vaddr_t
    alloc_kpages(int npages)
    {
        paddr_t paddr = getppages(npages);
        
        if(paddr == 0)
            return 0;
        
        return PADDR_TO_KVADDR(paddr);
    }
    
    /*
     * Free page
     * Stores the address of the current freeframe into the entry of the frame to be freed
     * and update the address of the freeframe.
     */
    static
    void
    freeppages(paddr_t paddr)
    {
        struct frame_table_entry *p;
        int i;
        spinlock_acquire(&frametable_lock);
        i = (paddr - frametop) / PAGE_SIZE;
        p = frame_table + i;
        p->next_freeframe = freeframe;
        freeframe = paddr;
        spinlock_release(&frametable_lock);
    }
    
    /*
     * Free page function for public accessing
     */
    void
    free_kpages(vaddr_t addr)
    {
        KASSERT(addr >= MIPS_KSEG0);
        
        paddr_t paddr = KVADDR_TO_PADDR(addr);
        if (paddr <= frametop) {
            // memory leakage
        }
        else {
            freeppages(paddr);
        }
    }

    kern/arch/mips/vm

    这是最关键的一个函数,当TLB里面找不到用户app需要的virtual page时,怎么处理

    /*
     * When TLB miss happening, a page fault will be trigged.
     * The way to handle it is as follow:
     * 1. check what page fault it is, if it is READONLY fault, 
     *    then do nothing just pop up an exception and kill the process
     * 2. if it is a read fault or write fault
     *    1. first check whether this virtual address is within any of the regions
     *       or stack of the current addrspace. if it is not, pop up a exception and
     *       kill the process, if it is there, goes on. 
     *    2. then try to find the mapping in the page table, 
     *       if a page table entry exists for this virtual address insert it into TLB 
     *    3. if this virtual address is not mapped yet, mapping this address,
     *     update the pagetable, then insert it into TLB
     */
    int
    vm_fault(int faulttype, vaddr_t faultaddress)
    {
        vaddr_t *vaddr1, *vaddr2, vaddr, vbase, vtop, faultadd = 0;
        paddr_t paddr;
        struct addrspace *as;
        struct as_region *s;
        uint32_t ehi, elo;
        int i, index1, index2, spl;
        unsigned int permis = 0;
        
        switch (faulttype) {
            case VM_FAULT_READONLY:
                return EFAULT;
            case VM_FAULT_READ:
            case VM_FAULT_WRITE:
                break;
            default:
                return EINVAL;
        }
        
        as = curthread -> t_addrspace;
        if (as == NULL) {
            return EFAULT;
        }
        
        // Align faultaddress
        faultaddress &= PAGE_FRAME;
        
        // Go through the link list of regions 
        // Check the validation of the faultaddress
        KASSERT(as->as_regions_start != 0);
        s = as->as_regions_start;
        while (s != 0) {
            KASSERT(s->as_vbase != 0);
            KASSERT(s->as_npages != 0);
            KASSERT((s->as_vbase & PAGE_FRAME) == s->as_vbase);
            vbase = s->as_vbase;
            vtop = vbase + s->as_npages * PAGE_SIZE;
            if (faultaddress >= vbase && faultaddress < vtop) {
                faultadd = faultaddress;
                permis = s->as_permissions;
                break;
            }
            s = s->as_next_region;
        }
        
        if (faultadd == 0) {
            vtop = USERSTACK;
            vbase = vtop - VM_STACKPAGES * PAGE_SIZE;
            if (faultaddress >= vbase && faultaddress < vtop) {
                faultadd = faultaddress;
                // Stack is readable, writable but not executable
                permis |= (PF_W | PF_R);
            }
            
            // faultaddress is not within any range of the regions and stack
            if (faultadd == 0) {
                return EFAULT;
            }
        }
        
        index1 = (faultaddress & TOP_TEN) >> 22;
        index2 = (faultaddress & MID_TEN) >> 12;
    
        vaddr1 = (vaddr_t *)(as->as_pagetable + index1 * 4);
        if (*vaddr1) {
            vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4);
            // If the mapping exits in page table,
            // get the address stores in PTE, 
            // translate it into physical address, 
            // check writeable flag,
            // and prepare the physical address for TLBLO
            if (*vaddr2 & PTE_VALID) {
                vaddr = *vaddr2 & PAGE_FRAME;
                paddr = KVADDR_TO_PADDR(vaddr);
                if (permis & PF_W) {
                    paddr |= TLBLO_DIRTY;
                }
            }
            // If not exists, do the mapping, 
            // update the PTE of the second page table,
            // check writeable flag,
            // and prepare the physical address for TLBLO
            else {
                vaddr = alloc_kpages(1);
                KASSERT(vaddr != 0);
                
                as_zero_region(vaddr, 1);
                *vaddr2 |= (vaddr | PTE_VALID);
                
                paddr = KVADDR_TO_PADDR(vaddr);
                if (permis & PF_W) {
                    paddr |= TLBLO_DIRTY;
                }
            }
        }
        // If second page table even doesn't exists, 
        // create second page table,
        // do the mapping,
        // update the PTE,
        // and prepare the physical address.
        else {
            *vaddr1 = alloc_kpages(1);
            KASSERT(*vaddr1 != 0);
            as_zero_region(*vaddr1, 1);
            
            vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4);
            vaddr = alloc_kpages(1);
            KASSERT(vaddr != 0);
            as_zero_region(vaddr, 1);
            *vaddr2 |= (vaddr | PTE_VALID);
    
            paddr = KVADDR_TO_PADDR(vaddr);
            if (permis & PF_W) {
                paddr |= TLBLO_DIRTY;
            }
        }
            
        spl = splhigh();
        
        // update TLB entry
        // if there still a empty TLB entry, insert new one in
        // if not, randomly select one, throw it, insert new one in
        for (i=0; i<NUM_TLB; i++) {
            tlb_read(&ehi, &elo, i);
            if (elo & TLBLO_VALID) {
                continue;
            }
            ehi = faultaddress;
            elo = paddr | TLBLO_VALID;
            tlb_write(ehi, elo, i);
            splx(spl);
            return 0;
        }
        
        // FIXME, TLB replacement algo.
        ehi = faultaddress;
        elo = paddr | TLBLO_VALID;
        tlb_random(ehi, elo);
        splx(spl);
        return 0;
    }

    在系统运行的过程中,会不断的产生page fault,这是因为,虽然系统给了运行的程序分配了页(分配的函数见kern/vm/frametable.c),但是这个TLB里面没有记录这个页面从虚拟地址到物理地址的映射,所以无法使用。所以在程序真正需要使用这个页的时候,需要首先访问TLB,从里面取出对应的物理地址。

           

  • 相关阅读:
    微软新一代Surface,该怎么看?
    Windows 8创新之路——样章分享
    微软新一代Surface发布,参数曝光
    从MS Word到Windows Live Writer
    《计算机科学基础》学习笔记_Part 1 Computer and Data
    我看Windows 8.1
    Hyper-V初涉_早期Windows安装虚拟硬件驱动
    2020.09.05【省选组】模拟 总结
    2020.08.15【NOIP提高组】模拟 总结
    2020.08.14【省选B组】模拟 总结
  • 原文地址:https://www.cnblogs.com/ingenuity/p/4543454.html
Copyright © 2011-2022 走看看