C语言中内存分配那些事儿

zoukankan html css js c++ java

C语言中内存分配那些事儿
C程序的内存结构

C语言的之所以复杂，首先它的内存模型功不可没。不像某些那样的高级语言只需要在使用对象的时候，用new创建。所有之后的事情，你不需要操心。对于C语言，所有与内存相关的东西，都需要熟悉，否则，时间一久，总会踩着雷。下图是典型的一个C程序的内存结构，当然还有一个重要的前提，这样的一种布局是在虚拟内存中的：

关于虚拟内存内核维护了一个页表（page table），用来表示虚拟内存对物理内存地址或者磁盘(交换区,swap area)间的一种映射关系。并非所有的虚拟地址都需要在物理内存上映射，不然不管多大的内存条的计算机，多开几个进程，内存就消耗没了。当你需要使用内存时再去向操作系统申请，如果请求合法，那么内核为页表添加表项，建立一种虚地址对物理地址间的映射关系。同理，当需要释放时，就解除映射，把资源还回去。

关于虚拟内存的优点《the linux programming interface》给出了比较完善的回答:

Processes are isolated from one another and from the kernel, so that one process can’t read or modify the memory of another process or the kernel. This is accomplished by having the page-table entries for each process point to distinct sets of physical pages in RAM (or in the swap area).

Where appropriate, two or more processes can share memory. The kernel makes this possible by having page-table entries in different processes refer to the same pages of RAM. Memory sharing occurs in two common circumstances:
– Multiple processes executing the same program can share a single (readonly) copy of the program code. This type of sharing is performed mplicitly when multiple programs execute the same program file (or load the same shared library).
– Processes can use the shmget() and mmap() system calls to explicitly request sharing of memory regions with other processes. This is done or the purpose of interprocess communication.

The implementation of memory protection schemes is facilitated; that is, pagetable entries can be marked to indicate that the contents of the corresponding page are readable, writable, executable, or some combination of these protections. Where multiple processes share pages of RAM, it is possible to specify that each process has different protections on the memory; for example, one process might have read-only access to a page, while another has read-write access.

Programmers, and tools such as the compiler and linker, don’t need to be concerned with the physical layout of the program in RAM.

Because only a part of a program needs to reside in memory, the program loads and runs faster. Furthermore, the memory footprint (i.e., virtual size) of a process can exceed the capacity of RAM.

One final advantage of virtual memory management is that since each process uses less RAM, more processes can simultaneously be held in RAM. This typically leads to better CPU utilization, since it increases the likelihood that, at any moment in time, there is at least one process that the CPU can execute.

本文主要就是为了讨论heap区内存分配的一些细节。

使用系统调用

通常我们称堆的当前边界为“program break”。分配堆区的内存，就是将program break向高地址移动的过程。UNIX系统中有两个系统调用与这个program break关系最密切：

int brk(void *addr);

void *sbrk(intptr_t increment);

sbrk（）是通过把program break移动increment的长度实现内存的增加和释放。由于虚拟内存按页分配，所以increment的值并不是实际分配的结果。只要不是页的大小的整数倍，就会多给一个页，不是四舍五入，永远只会多给，除非内存不足。sbrk（0）显示当前的program break。当试图访问program break之外的内存时会产生SIGSEV信号，发生段错误。brk() 函数和sbrk()用法大同小异。
1 int main() 2 { 3 int* p = sbrk(100); 4 *(p+1023) =4; 5 printf("** "); 6 *(p+1024) =4; 7 }
这样一段代码，向内核申请100字节的内存，实际上映射的是一个内存页，行4访问内存页的最后4个字节并且改写，行6访问映射关系之外的内存显然是非法的，程序的运行结果如下：

$a.out

　 **
　 Segmentation fault

  用brk()/sbrk()释放内存时，也不定会立即解除映射关系。当program break 下降超过一个页时，才有可能将申请的物理内存返还给内核。当然释放之后所有的对这块内存的操作都是未定义的，与玩火无异。同时program break移动还要注意的一点就是，program break的位置不能移动到heap区之外的地方，比如bss区，数据区等等，这样的行为基本也属于作死的行为之中。

使用C标准库函数

malloc()/free() 绝对是C语言中使用最广泛的函数之一了。相比brk()/sbrk()他接口更加简单，也允许随意释放内存。（brk()/sbrk() 不能随意释放是由于program break往下移动的释放内存的时候，会把顶部“无辜”的元素也释放了。）例如这样的情况（这里内存映射解除了）：

而free()释放并没有这样的“坑”，因为free释放内存不一定会移动program break。如果要free() 释放的内存上方（高内存地址处）仍然有没有释放的内存，那么program break就不会移动，因此也不会解除映射关系，也就是说这块内存并没有返还给内核。而是作为空闲的内存交给free维护去了，待下次malloc申请时，再返回这块内存（如果够用的话）给malloc返回。那么free又如何知道释放内存的大小的呢？这是由于malloc返回的内存拥有一个比较特殊的结构：

在这块内存的前面记录着这块内存的大小。当回收这块内存时，就会记录下他的长度和地址。当再次malloc时就会比较空闲内存列表是否有符合要求的内存，交给程序“二次使用”（或者N次使用）。当然至于用不用空闲内存列表的内存还要取决于具体情况：

1.如果空余的内存比malloc申请的大，那么就切割一部分给malloc返回，剩余的部分再看做是一块空闲的内存，留给下次的malloc使用。

2.如果malloc时没有合适的空闲的内存，那么就会像普通情况那样移动program break，或许申请新的内存（可能上回映射的时候会有富余，就不需要重新映射）。

知道了这些基本的实现之后，我们却发现malloc()、free()是比较危险的函数了，使用申请的内存时一定要小心，特别是边界的情况，否则结果可能是灾难性的。比如这样的一种情况，使用分配的内存后，仅仅越界了1个字节，而这一个字节恰恰记录着另一块内存的长度，当释放这块内存的时候，free维护了错误的长度，而下回有申请内存时把这块内存交给malloc那么一场“灾难”便到来了。

其余的内存分配函数

  void *calloc(size_t nmemb, size_t size);

　 void *realloc(void *ptr, size_t size);

calloc()与malloc类似，分配nmemb个大小为size的对象，但是与malloc不同的是：calloc会把分配的内存初始化为0.

realloc() 正如名字那样是“重新分配”的意思，用来调整已经分配内存ptr的大小，如果ptr之后的内存不够就会申请一块新的区域，将原有内存原样复制过去，新增加的内存不作初始化。因此返回的结果可能与ptr不同，实际上不部分时候都是不相同的。因此realloc效率是不够高的。万不得已的时候，建议不要使用。

  void *alloca(size_t size);

作用是在栈上分配内存。manual上是这样描述的：

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

在栈上分配内存的需要的场景不多，比如setjmp，longjmp执行非局部跳转的时候需要使用分配的内存时，就应该考虑alloca，因为他申请的内存会自动的释放，所以不会出现longjmp“回跳”时候，内存泄露的情况。这样的函数偶尔用一用还是有利于身心健康的。
查看全文

相关阅读:
Solr的核心操作案例
 分布式锁
 AngularJS——AngularJS实现地址栏取值
 【转】保证消息队列的高可用性
 【转】Spring线程及线程池的使用
 微信支付实现
 分布式id的生成方式——雪花算法
 重载new和delete
C++工程实践
 语言基础（27）：异常处理

原文地址：https://www.cnblogs.com/ittinybird/p/4657245.html