虚拟内存是现代操作系统普遍使用的一种技术。
虚拟内存的基本思想是,每个进程有用独立的逻辑地址空间,内存被分为大小相等的多个块,称为页(Page)。每个页都是一段连续的地址。对于进程来看,逻辑上貌似有很多内存空间,其中一部分对应物理内存上的一块(称为页框 page frame,通常页和页框大小相等),还有一些没加载在内存中的对应在硬盘上。通过引入进程的逻辑地址,把进程地址空间与实际存储空间分离,增加存储管理的灵活性。
地址空间和存储空间两个基本概念的定义如下:
地址空间:将源程序经过编译后得到的目标程序,存在于它所限定的地址范围内,这个范围称为地址空间。地址空间是逻辑地址的集合。
存储空间:指主存中一系列存储信息的物理单元的集合,这些单元的编号称为物理地址存储空间是物理地址的集合。
由此衍生出的管理方式有三种:
页式存储管理、段式存储管理和段页式存储管理。这里主要介绍页式存储。
在页式系统中进程建立时,操作系统为进程中所有的页分配页框。当进程撤销时收回所有分配给它的页框。在程序的运行期间,如果允许进程动态地申请空间,操作系统还要为进程申请的空间分配物理页框。操作系统为了完成这些功能,必须记录系统内存中实际的页框使用情况。操作系统还要在进程切换时,正确地切换两个不同的进程地址空间到物理内存空间的映射。为了理解操作系统如何完成这些需求,我们先理解页表技术。先看张图,转载自51CTO:
页表中的条目被称为页表项(page table entry),一个页表项负责记录一段虚拟地址到物理地址的映射关系。
既然页表是存储在内存中的,那么程序每次完成一次内存读取时都至少会访问内存两次,相比于不使用MMU(MMU是Memory Management Unit的缩写,它代表集成在CPU内部的一个硬件逻辑单元,主要作用是给CPU提供从虚拟地址向物理地址转换的功能,从硬件上给软件提供一种内存保护的机制)时的一次内存访问,效率被大大降低了,如果所使用的内存的性能比较差的话,这种效率的降低将会更明显。因此,如何在发挥MMU优势的同时使系统消耗尽量减小,就成为了一个亟待解决的问题。
于是,TLB产生了。TLB是什么呢?我们叫它转换旁路缓冲器,它实际上是MMU中临时存放转换数据的一组重定位寄存器。既然TLB本质上是一组寄存器,那么不难理解,相比于访问内存中的页表,访问TLB的速度要快很多。因此如果页表的内容全部存放于TLB中,就可以解决访问效率的问题了。
然而,由于制造成本等诸多限制,所有页表都存储在TLB中几乎是不可能的。这样一来,我们只能通过在有限容量的TLB中存储一部分最常用的页表,从而在一定程度上提高MMU的工作效率。
这一方法能够产生效果的理论依据叫做存储器访问的局部性原理。它的意思是说,程序在执行过程中访问与当前位置临近的代码的概率更高一些。因此,从理论上我们可以说,TLB中存储了当前时间段需要使用的大多数页表项,所以可以在很大程度上提高MMU的运行效率。
我们这里所用的是二级页表的技术,何为二级页表,即是MMU采用二级查表的方法,即首先由虚拟地址索引出第一张表的某一段内容,然后再根据这段内容搜索第二张表,最后才能确定物理地址。这里的第一张表,我们叫它一级页表,第二张表被称为是二级页表。采用二级查表法的主要目的是减小页表自身占据的内存空间,但缺点是进一步降低了内存的寻址效率。
好了,前情介绍完毕,下面上干货,用哈佛大学开发的用于教学的OS161来实现VM,OS161基于MIP-I hardware。
代码位于github上:https://github.com/tian-jiang/OS161-VirtualMemory
首先看一段代码,kern/arch/mips/include/vm.h,物理内存的分配定义在此
/* * MIPS-I hardwired memory layout: * 0xc0000000 - 0xffffffff kseg2 (kernel, tlb-mapped) * 0xa0000000 - 0xbfffffff kseg1 (kernel, unmapped, uncached) * 0x80000000 - 0x9fffffff kseg0 (kernel, unmapped, cached) * 0x00000000 - 0x7fffffff kuseg (user, tlb-mapped) * * (mips32 is a little different) */ #define MIPS_KUSEG 0x00000000 #define MIPS_KSEG0 0x80000000 #define MIPS_KSEG1 0xa0000000 #define MIPS_KSEG2 0xc0000000
内存的分配用图表示如下
这张图展示了在OS161中物理内存的分配.
让我们从头开始:kern/startup/man.c
1 /* Early initialization. */ 2 ram_bootstrap(); 3 ....... 4 5 /* Late phase of initialization. */ 6 vm_bootstrap(); 7 ........
在操作系统启动的时候,调用raw_bootstrap()以及vm_bootstrap()来启动vm管理模块。那么这两个函数是在哪里定义和使用的呢,我们接着看下面的代码。
kern/include/vm.h和kern/arch/mips/include/vm.h
/* Initialization function */ void vm_bootstrap(void);
......
/* Allocate/free kernel heap pages (called by kmalloc/kfree) */
void frametable_bootstrap(void);
/* * Interface to the low-level module that looks after the amount of * physical memory we have. * * ram_getsize returns the lowest valid physical address, and one past * the highest valid physical address. (Both are page-aligned.) This * is the memory that is available for use during operation, and * excludes the memory the kernel is loaded into and memory that is * grabbed in the very early stages of bootup. * * ram_stealmem can be used before ram_getsize is called to allocate * memory that cannot be freed later. This is intended for use early * in bootup before VM initialization is complete. */ void ram_bootstrap(void); paddr_t ram_stealmem(unsigned long npages); void ram_getsize(paddr_t *lo, paddr_t *hi);
这两个function是定义在这里的,那么这两个function又是干什么事情的呢
kern/arch/mips/vm/ram.c, kern/arch/mips/vm/vm.c, kern/vm/frametable.c
vaddr_t firstfree; /* first free virtual address; set by start.S */ static paddr_t firstpaddr; /* address of first free physical page */ static paddr_t lastpaddr; /* one past end of last free physical page */ /* * Called very early in system boot to figure out how much physical * RAM is available. */ void ram_bootstrap(void) { size_t ramsize; /* Get size of RAM. */ ramsize = mainbus_ramsize(); /* * This is the same as the last physical address, as long as * we have less than 508 megabytes of memory. If we had more, * various annoying properties of the MIPS architecture would * force the RAM to be discontiguous. This is not a case we * are going to worry about. */ if (ramsize > 508*1024*1024) { ramsize = 508*1024*1024; } lastpaddr = ramsize; /* * Get first free virtual address from where start.S saved it. * Convert to physical address. */ firstpaddr = firstfree - MIPS_KSEG0; kprintf("%uk physical memory available ", (lastpaddr-firstpaddr)/1024); }
/* * Initialise the frame table */ void vm_bootstrap(void) { frametable_bootstrap(); }
/* * Make variables static to prevent it from other file's accessing */ static struct frame_table_entry *frame_table; static paddr_t frametop, freeframe; /* * initialise frame table */ void frametable_bootstrap(void) { struct frame_table_entry *p; paddr_t firsta, lasta, paddr; unsigned long framenum, entry_num, frame_table_size, i; // get the useable range of physical memory ram_getsize(&firsta, &lasta); KASSERT((firsta & PAGE_FRAME) == firsta); KASSERT((lasta & PAGE_FRAME) == lasta); framenum = (lasta - firsta) / PAGE_SIZE; // calculate the size of the whole framemap frame_table_size = framenum * sizeof(struct frame_table_entry); frame_table_size = ROUNDUP(frame_table_size, PAGE_SIZE); entry_num = frame_table_size / PAGE_SIZE; KASSERT((frame_table_size & PAGE_FRAME) == frame_table_size); frametop = firsta; freeframe = firsta + frame_table_size; if (freeframe >= lasta) { // This is impossible for most of the time panic("vm: framemap consume physical memory? "); } // keep the frame state in the top of the useable range of physical memory // the free frame page address started from the end of the frame map frame_table = (struct frame_table_entry *) PADDR_TO_KVADDR(firsta); // Initialise the frame list, each entry corrsponding to a frame, // and each entry stores the address of the next free frame. // If the next frame address of this entry equals zero, means this current frame is allocated p = frame_table; for (i = 0; i < framenum-1; i++) { if (i < entry_num) { p->next_freeframe = 0; p += 1; continue; } paddr = frametop + (i+1) * PAGE_SIZE; p->next_freeframe = paddr; p += 1; } }
kern/include/vm.h
struct frame_table_entry { // address of next free frame size_t next_freeframe; };
raw_bootstrap是系统初始化时用来查看有多少物理内存可以使用的。而vm_bootstrap只是简单的调用了frametable_bootstrap(),而frametable_bootstrap()则是将能用的物理内存分页,每页大小为4K,然后保存一个记录空白页的linked list在内存中,从free的内存的顶部开始存放,但是在存放之前,先要算出需要多少空间来存放这个frame table。所以代码的前段在计算frame table的大小,后面则是初始化frame table这个linked list。因为初始化的时候都是空的,所以直接指向下一个page的地址即可。
操作系统的vm初始化到此完毕。那vm是怎么使用的呢,请看下面
kern/include/vm.h
/* Fault handling function called by trap code */ int vm_fault(int faulttype, vaddr_t faultaddress); vaddr_t alloc_kpages(int npages); void free_kpages(vaddr_t addr);
kern/include/addrspace.h,实现在kern/vm/addrspace.c
/* * Address space - data structure associated with the virtual memory * space of a process. * * You write this. */ /* * A linked list which defined to store the information for regions(code, text, bss...) */ struct as_region { vaddr_t as_vbase; /* the started virtual address for one region */ size_t as_npages; /* how many pages this region occupied from the vbase */ unsigned int as_permissions; /* does this region readable? writable? executable? */ struct as_region *as_next_region; /* address of the following region */ }; struct addrspace { #if OPT_DUMBVM vaddr_t as_vbase1; paddr_t as_pbase1; size_t as_npages1; vaddr_t as_vbase2; paddr_t as_pbase2; size_t as_npages2; paddr_t as_stackpbase; #else /* Put stuff here for your VM system */ struct as_region *as_regions_start; /* header of the regions linked list */ vaddr_t as_pagetable; /* address of the first-level page table */ #endif }; /* * The structure of PTE in page table: * | address | PTE_VALID | PE_W | PF_R | PF_X * the virtual address of frame | valid indicator | writeable flag | readable flag | executable flag * I don't use structure to represent PTE, just use type vaddr_t, and becuase the last 12 bit is free * for a virtual address of frame, some of they could be used for the flags */ /* * Functions in addrspace.c: * * as_create - create a new empty address space. You need to make * sure this gets called in all the right places. You * may find you want to change the argument list. May * return NULL on out-of-memory error. * * as_copy - create a new address space that is an exact copy of * an old one. Probably calls as_create to get a new * empty address space and fill it in, but that's up to * you. * * as_activate - make the specified address space the one currently * "seen" by the processor. Argument might be NULL, * meaning "no particular address space". * * as_destroy - dispose of an address space. You may need to change * the way this works if implementing user-level threads. * * as_define_region - set up a region of memory within the address * space. * * as_prepare_load - this is called before actually loading from an * executable into the address space. * * as_complete_load - this is called when loading from an executable * is complete. * * as_define_stack - set up the stack region in the address space. * (Normally called *after* as_complete_load().) Hands * back the initial stack pointer for the new process. * * as_zero_region - zero out a new allocated page. * * as_destroy_regions - free all the space allocated for regions storeage. */ struct addrspace *as_create(void); int as_copy(struct addrspace *src, struct addrspace **ret); void as_activate(struct addrspace *); void as_destroy(struct addrspace *); int as_define_region(struct addrspace *as, vaddr_t vaddr, size_t sz, int readable, int writeable, int executable); int as_prepare_load(struct addrspace *as); int as_complete_load(struct addrspace *as); int as_define_stack(struct addrspace *as, vaddr_t *initstackptr); void as_zero_region(vaddr_t vaddr, unsigned npages); void as_destroy_regions(struct as_region *ar);
kern/vm/frametable.c
/* * Allocate n pages. * Before frame table initialisation, using ram_stealmem */ static paddr_t getppages(int npages) { paddr_t paddr; struct frame_table_entry *p; int i; spinlock_acquire(&frametable_lock); if (frame_table == 0) paddr = ram_stealmem(npages); else { if (npages > 1){ spinlock_release(&frametable_lock); return 0; } // Freeframe equals zero means all the frames have been allocated // and there is no frame to use. if (freeframe == 0){ spinlock_release(&frametable_lock); return 0; } // Get the current free frame's entry id // and retrieve the next free frame paddr = freeframe; i = (freeframe - frametop) / PAGE_SIZE; p = frame_table + i; freeframe = p->next_freeframe; p->next_freeframe = 0; } spinlock_release(&frametable_lock); return paddr; } /* * Allocation function for public accessing * Returning virtual address of frame */ vaddr_t alloc_kpages(int npages) { paddr_t paddr = getppages(npages); if(paddr == 0) return 0; return PADDR_TO_KVADDR(paddr); } /* * Free page * Stores the address of the current freeframe into the entry of the frame to be freed * and update the address of the freeframe. */ static void freeppages(paddr_t paddr) { struct frame_table_entry *p; int i; spinlock_acquire(&frametable_lock); i = (paddr - frametop) / PAGE_SIZE; p = frame_table + i; p->next_freeframe = freeframe; freeframe = paddr; spinlock_release(&frametable_lock); } /* * Free page function for public accessing */ void free_kpages(vaddr_t addr) { KASSERT(addr >= MIPS_KSEG0); paddr_t paddr = KVADDR_TO_PADDR(addr); if (paddr <= frametop) { // memory leakage } else { freeppages(paddr); } }
kern/arch/mips/vm
这是最关键的一个函数,当TLB里面找不到用户app需要的virtual page时,怎么处理
/* * When TLB miss happening, a page fault will be trigged. * The way to handle it is as follow: * 1. check what page fault it is, if it is READONLY fault, * then do nothing just pop up an exception and kill the process * 2. if it is a read fault or write fault * 1. first check whether this virtual address is within any of the regions * or stack of the current addrspace. if it is not, pop up a exception and * kill the process, if it is there, goes on. * 2. then try to find the mapping in the page table, * if a page table entry exists for this virtual address insert it into TLB * 3. if this virtual address is not mapped yet, mapping this address, * update the pagetable, then insert it into TLB */ int vm_fault(int faulttype, vaddr_t faultaddress) { vaddr_t *vaddr1, *vaddr2, vaddr, vbase, vtop, faultadd = 0; paddr_t paddr; struct addrspace *as; struct as_region *s; uint32_t ehi, elo; int i, index1, index2, spl; unsigned int permis = 0; switch (faulttype) { case VM_FAULT_READONLY: return EFAULT; case VM_FAULT_READ: case VM_FAULT_WRITE: break; default: return EINVAL; } as = curthread -> t_addrspace; if (as == NULL) { return EFAULT; } // Align faultaddress faultaddress &= PAGE_FRAME; // Go through the link list of regions // Check the validation of the faultaddress KASSERT(as->as_regions_start != 0); s = as->as_regions_start; while (s != 0) { KASSERT(s->as_vbase != 0); KASSERT(s->as_npages != 0); KASSERT((s->as_vbase & PAGE_FRAME) == s->as_vbase); vbase = s->as_vbase; vtop = vbase + s->as_npages * PAGE_SIZE; if (faultaddress >= vbase && faultaddress < vtop) { faultadd = faultaddress; permis = s->as_permissions; break; } s = s->as_next_region; } if (faultadd == 0) { vtop = USERSTACK; vbase = vtop - VM_STACKPAGES * PAGE_SIZE; if (faultaddress >= vbase && faultaddress < vtop) { faultadd = faultaddress; // Stack is readable, writable but not executable permis |= (PF_W | PF_R); } // faultaddress is not within any range of the regions and stack if (faultadd == 0) { return EFAULT; } } index1 = (faultaddress & TOP_TEN) >> 22; index2 = (faultaddress & MID_TEN) >> 12; vaddr1 = (vaddr_t *)(as->as_pagetable + index1 * 4); if (*vaddr1) { vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4); // If the mapping exits in page table, // get the address stores in PTE, // translate it into physical address, // check writeable flag, // and prepare the physical address for TLBLO if (*vaddr2 & PTE_VALID) { vaddr = *vaddr2 & PAGE_FRAME; paddr = KVADDR_TO_PADDR(vaddr); if (permis & PF_W) { paddr |= TLBLO_DIRTY; } } // If not exists, do the mapping, // update the PTE of the second page table, // check writeable flag, // and prepare the physical address for TLBLO else { vaddr = alloc_kpages(1); KASSERT(vaddr != 0); as_zero_region(vaddr, 1); *vaddr2 |= (vaddr | PTE_VALID); paddr = KVADDR_TO_PADDR(vaddr); if (permis & PF_W) { paddr |= TLBLO_DIRTY; } } } // If second page table even doesn't exists, // create second page table, // do the mapping, // update the PTE, // and prepare the physical address. else { *vaddr1 = alloc_kpages(1); KASSERT(*vaddr1 != 0); as_zero_region(*vaddr1, 1); vaddr2 = (vaddr_t *)(*vaddr1 + index2 * 4); vaddr = alloc_kpages(1); KASSERT(vaddr != 0); as_zero_region(vaddr, 1); *vaddr2 |= (vaddr | PTE_VALID); paddr = KVADDR_TO_PADDR(vaddr); if (permis & PF_W) { paddr |= TLBLO_DIRTY; } } spl = splhigh(); // update TLB entry // if there still a empty TLB entry, insert new one in // if not, randomly select one, throw it, insert new one in for (i=0; i<NUM_TLB; i++) { tlb_read(&ehi, &elo, i); if (elo & TLBLO_VALID) { continue; } ehi = faultaddress; elo = paddr | TLBLO_VALID; tlb_write(ehi, elo, i); splx(spl); return 0; } // FIXME, TLB replacement algo. ehi = faultaddress; elo = paddr | TLBLO_VALID; tlb_random(ehi, elo); splx(spl); return 0; }
在系统运行的过程中,会不断的产生page fault,这是因为,虽然系统给了运行的程序分配了页(分配的函数见kern/vm/frametable.c),但是这个TLB里面没有记录这个页面从虚拟地址到物理地址的映射,所以无法使用。所以在程序真正需要使用这个页的时候,需要首先访问TLB,从里面取出对应的物理地址。