zoukankan      html  css  js  c++  java
  • 计算机体系结构 -内存优化vm+oom

    http://www.cnblogs.com/dkblog/archive/2011/09/06/2168721.html
    https://www.kernel.org/doc/Documentation/vm/

    内存设置参数位置:
    [root@server1 vm]# pwd
    /proc/sys/vm

    [root@server1 vm]# ls
    block_dump                 extfrag_threshold           memory_failure_recovery  numa_zonelist_order       stat_interval
    compact_memory             extra_free_kbytes           min_free_kbytes          oom_dump_tasks            swappiness
    dirty_background_bytes     hugepages_treat_as_movable  min_slab_ratio           oom_kill_allocating_task  unmap_area_factor
    dirty_background_ratio     hugetlb_shm_group           min_unmapped_ratio       overcommit_memory         vfs_cache_pressure
    dirty_bytes                laptop_mode                 mmap_min_addr            overcommit_ratio          would_have_oomkilled
    dirty_expire_centisecs     legacy_va_layout            nr_hugepages             page-cluster              zone_reclaim_mode
    dirty_ratio                lowmem_reserve_ratio        nr_hugepages_mempolicy   panic_on_oom
    dirty_writeback_centisecs  max_map_count               nr_overcommit_hugepages  percpu_pagelist_fraction
    drop_caches                memory_failure_early_kill   nr_pdflush_threads       scan_unevictable_pages

    每个进程 OOM设置:

    [root@server1 1]# pwd /proc/1 [root@server1 1]# ls |grep oom oom_adj oom_score oom_score_adj




    /proc/slabinfo
    /proc/buddyinfo
    /proc/zoneinfo
    /proc/meminfo


    [root@monitor /]# slabtop

     Active / Total Objects (% used)    : 347039 / 361203 (96.1%)
     Active / Total Slabs (% used)      : 24490 / 24490 (100.0%)
     Active / Total Caches (% used)     : 88 / 170 (51.8%)
     Active / Total Size (% used)       : 98059.38K / 99927.38K (98.1%)
     Minimum / Average / Maximum Object : 0.02K / 0.28K / 4096.00K

      OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
    115625 115344  99%    0.10K   3125       37     12500K buffer_head
     73880  73437  99%    0.19K   3694       20     14776K dentry
     42184  42180  99%    0.99K  10546        4     42184K ext4_inode_cache
     20827  20384  97%    0.06K    353       59      1412K size-64
     16709  13418  80%    0.05K    217       77       868K anon_vma_chain
     15792  15708  99%    0.03K    141      112       564K size-32
     11267  10323  91%    0.20K    593       19      2372K vm_area_struct
     10806  10689  98%    0.64K   1801        6      7204K proc_inode_cache
      9384   5232  55%    0.04K    102       92       408K anon_vma
      7155   7146  99%    0.07K    135       53       540K selinux_inode_security
      7070   7070 100%    0.55K   1010        7      4040K radix_tree_node
      6444   6443  99%    0.58K   1074        6      4296K inode_cache
      5778   5773  99%    0.14K    214       27       856K sysfs_dir_cache
      3816   3765  98%    0.07K     72       53       288K Acpi-Operand
      2208   2199  99%    0.04K     24       92        96K Acpi-Namespace
      1860   1830  98%    0.12K     62       30       248K size-128
      1440   1177  81%    0.19K     72       20       288K size-192
      1220    699  57%    0.19K     61       20       244K filp
       660    599  90%    1.00K    165        4       660K size-1024



    [root@monitor xx]# cat /proc/meminfo |grep HugePage AnonHugePages: 2048 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0

    1.vi /etc/sysctl.conf
    加入
    vm.nr_hugepages = 10

    2.sysctl -p
    [root@monitor /]#  cat /proc/meminfo |grep Huge
    AnonHugePages:      2048 kB
    HugePages_Total:      10
    HugePages_Free:       10
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB

    3.应用于应用程序
    [root@monitor /]# mkdir /hugepages
    [root@monitor /]# mount -t  hugetlbfs  none  /hugepages

    [root@monitor /]# dd if=/dev/zero of=/hugepages/a.out bs=1M count=5
    Hugetable page:

    Hugetlbfs support is built on top of multiple page size support that is provided by most modern
    architectures Users can use the huge page support
    in Linux kernel by either using the mmap system call or
    standard Sysv shared memory system calls (shmget, shmat) cat
    /proc/meminfo | grep HugePage
    Improving TLB performance:

    Kernel must usually flush TLB entries upon a context switch Use free, contiguous physical pages Automatically via the buddy allocator /proc/buddyinfo Manually via hugepages (not pageable) Linux supports large sized pages through the hugepages mechanism Sometimes known as bigpages, largepages or the hugetlbfs filesystem Consequences TLB cache hit more likely Reduces PTE visit count
    Tuning TLB performance

    Check size of hugepages x86info
    -a | grep “Data TLB” dmesg cat /proc/meminfo
    Enable hugepages 1.In
    /etc/sysctl.conf vm.nr_hugepages = n
    2.Kernel parameter //操作系动起动时传参数 hugepages
    =n Configure hugetlbfs if needed by application
    mmap system call requires that hugetlbfs
    is mounted mkdir /hugepages mount -t hugetlbfs none /hugepages shmat and shmget system calls do not require hugetlbfs
    Trace every system call made by a program
    strace
    -o /tmp/strace.out -p PID grep mmap /tmp/strace.out

    Summarize system calls strace -c -p PID or strace -c COMMAND
    strace command Other uses Investigate
    lock contentions Identify problems caused by improper file permissions Pinpoint IO problems
    Strategies for using memory
    使用内存优化

    1.Reduce overhead for tiny memory objects Slab cache
    cat /proc/slabinfo

    echo 'ext4_inode_cache 108 54 8' >/proc/slabinfo
    2.Reduce or defer service time
    for slower subsystems Filesystem metadata: buffer cache , slab cache //缓存文件元数据 Disk IO: page cache //缓存数据 Interprocess communications: shared memory //共享内存 Network IO: buffer cache, arp cache, connection tracking 3.Considerations when tuning memory How should pages be reclaimed to avoid pressure? Larger writes are usually more efficient due to re-sorting


    内存参数设置:
    vm.min_free_kbytes:
    1.因为内存耗近,系统会崩溃
    2.因此保有空闲内存剩下,当进程请求内存分配,不足会把其他内存交换到SWAP中,从而便腾去足够空间去给请求
      Tuning vm.min_free_kbytes only be necessary when an application regularly needs to allocate a large block of memory, then frees that same memory   使用情况:
    It may well be the
    case that
    the system has too little disk bandwidth,
    too little CPU power, or
    too little memory to handle its load

    Linux 提供了这样一个参数min_free_kbytes,用来确定系统开始回收内存的阀值,控制系统的空闲内存。值越高,内核越早开始回收内存,空闲内存越高。
    http://www.cnblogs.com/itfriend/archive/2011/12/14/2287160.html Consequences   Reduces service time
    for demand paging   Memory is not available for other useage   Can cause pressure on ZONE_NORMAL
    Linux服务器内存使用量超过阈值,触发报警。
    
    问题排查
    
    首先,通过free命令观察系统的内存使用情况,显示如下:
    
    total       used       free     shared    buffers     cached 
    Mem:      24675796   24587144      88652          0     357012    1612488 
    -/+ buffers/cache:   22617644    2058152 
    Swap:      2096472     108224    1988248 
    其中,可以看出内存总量为24675796KB,已使用22617644KB,只剩余2058152KB。
    
    然后,接着通过top命令,shift + M按内存排序后,观察系统中使用内存最大的进程情况,发现只占用了18GB内存,其他进程均很小,可忽略。
    
    因此,还有将近4GB内存(22617644KB-18GB,约4GB)用到什么地方了呢?
    
    进一步,通过cat /proc/meminfo发现,其中有将近4GB(3688732 KB)的Slab内存:
    
    ...... 
    Mapped:          25212 kB 
    Slab:          3688732 kB 
    PageTables:      43524 kB 
    ...... 
    Slab是用于存放内核数据结构缓存,再通过slabtop命令查看这部分内存的使用情况:
    
    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME 
    13926348 13926348 100%    0.21K 773686       18   3494744K dentry_cache 
    334040 262056  78%    0.09K   8351       40     33404K buffer_head 
    151040 150537  99%    0.74K  30208        5    120832K ext3_inode_cache 
    发现其中大部分(大约3.5GB)都是用于了dentry_cache。
    
    问题解决
    drop_caches
    
    
    
    To free pagecache:
    	echo 1 > /proc/sys/vm/drop_caches      [include buffer cache and page cache]
    To free reclaimable slab objects (includes dentries and inodes):
    	echo 2 > /proc/sys/vm/drop_caches      [ 说明dentris and inodes不在 buffer cache 与 page cache中]
    To free slab objects and pagecache:            [全部释放]
    	echo 3 > /proc/sys/vm/drop_caches
    
    http://www.kernel.org/doc/Documentation/sysctl/vm.txt
    
    

    注意:在清空缓存之前使用sync命令同步数据到磁盘

    2. 方法1需要用户具有root权限,如果不是root,
    但有sudo权限,可以通过sysctl命令进行设置:
    $sync
    $sudo sysctl
    -w vm.drop_caches=3

    $sudo sysctl -w vm.drop_caches=0 #recovery drop_caches 操作后可以通过

    sudo sysctl
    -a | grep drop_caches查看是否生效。
    物理内存过量使用是以swap为基础的:    //数据库上不要用,因为被SWAP很慢
    vm.overcommit_memory  
        0 = heuristic overcommit  //系统自决定过量使用
        1 = always overcommit     //总是能够使用SWAP
        2 = commit all RAM plus a percentage of swap (may be > 100)   
    = RAM+ SWAP*overcommit_ratio
    [SWAP*overcommit_ratio]<实际虚拟内存SWAP
    vm.overcommit_ratio: //可以超出物理内存的百分比,一般不要超过%50 Specified the percentage of physical memory allowed to be overcommited
    when the vm.overcommit_memory
    is set to 2
    View Committed_AS
    in /proc/meminfo An estimate of how much RAM is required to avoid an out of memory (OOM) condition
    for the current workload on a system


    Slab cache

    Tiny kernel objects are stored in the slab Extra overhead of tracking is better than using 1 page/object Example: filesystem metadata (dentry and inode caches ) Monitoring /proc/slabinfo slabtop vmstat -m Tuning a particular slab cache echo “cache_name limit  batchcount shared” > /proc/slabinfo limit  the maximum number of objects that will be cached for each CPU    batchcount  the maximum number of global cache objects that will be transferred to the per-CPU cache when it becomes empty    shared  the sharing behavior for Symmetric MultiProcessing (SMP) systems
    arp cache

    ARP entries map hardware addresses to protocol addresses 1. Cached
    in /proc/net/arp By default, the cache is limited to 512 entries as a soft limit and 1024 entries as a hard limit 超过512会自动修简   
    2. Garbage collection removes stale or older entries

    [root@server1 proc]# cat /proc/net/arp
    IP address       HW type     Flags       HW address            Mask     Device
    112.74.75.247    0x1         0x2         70:f9:6d:ee:67:af     *        eth1
    10.24.223.247    0x1         0x2         70:f9:6d:ee:67:af     *        eth0

    Insufficient ARP cache leads to Intermittent timeouts between hosts ARP thrashing Too much ARP cache puts pressure on ZONE_NORMAL
    List entries //显示缓存条目 ip neighbor list Flush cache //清空缓存条目 ip neighbor flush dev ethX
    Adjust where the gc will leave arp table alone
      net.ipv4.neigh.default.gc_thresh1
      default 128                       //小于128条目不被清除 不管是否过期,不被GC清除

    Soft upper limit //软限制: 超过512,超过5秒,被清除   net.ipv4.neigh.default.gc_thresh2   defalut 512   Becomes hard limit after 5 seconds
    Hard upper limit //硬限制   net.ipv4.neigh.
    default.gc_thresh3
    Garbage collection frequency
    in seconds //每隔几秒钟执行清理 大于128,过期的条目 (5分钟过期)   net.ipv4.neigh.default.gc_interval


    vvm.lowmem_reserve_ratio
     
    For some specialised workloads on highmem machines it is dangerous for the kernel to allow process memory to be allocated from the "lowmem" zone
     
    Linux page allocator has a mechanism which prevents allocations  which could use highmem from using too much lowmem
     
    The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is in defending these lower zones
     
    If you have a machine which uses highmem or ISA DMA and your applications are using mlock(), or if you are running with no swap then you probably should change the lowmem_reserve_ratio setting 
    page cache:

    A large percentage of paging activity is due to I/O File reads: each page of file read from disk into memory These pages form the page cache Page cache is always checked for IO requests Directory reads Reading and writing regular files Reading and writing via block device files, DISK IO Accessing memory mapped files, mmap Accessing swapped out pages Pages in the page cache are associated with file data
    Tuning page cache:

    View page cache allocation in /proc/meminfo Tune length/size of memory vm.lowmem_reserve_ratio vm.vfs_cache_pressure Tune arrival/completion rate vm.page-cluster vm.zone_reclaim_mode
    vfs_cache_pressure
    Controls the tendency of the kernel to reclaim the memory which
    is used for caching of directory and inode objects
    1.At the
    default value of vfs_cache_pressure=100
    the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim
    2.Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches
    3.When vfs_cache_pressure
    =0, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions
    4.Increasing vfs_cache_pressure beyond
    100 causes the kernel to prefer to reclaim dentries and inodes

    0:不回收dentries和inodes;
    1-99:倾向于不回收 dentries和 inodes
    100: 当 page cache 和 swap cache回收时,回收dentries和inodes
    100+: 倾向于回收dentries和 inode
    page-cluster
      1.page-cluster controls the number of pages which are written to swap in a single attempt
       2.It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages",
    setting it to 2 means "4 pages", etc 3.he default value is three (eight pages at a time) 4.There may be some small benefits in tuning this to a different value if your workload is swap-intensive

    1.2的n次个页交换到SWAP,当系统需要大量使用SWAP时 (虚拟化,云计算环境)
    zone_reclaim_mode: 
      Zone_reclaim_mode allows someone to
    set more or less aggressive approaches to reclaim memory
    when a zone runs out of memory If it is set to zero then no zone reclaim occurs Allocations will be satisfied from other zones / nodes in the system
    This
    is value ORed together of 1 = Zone reclaim on 回收打打 2 = Zone reclaim writes dirty pages out 回收写的脏页 4 = Zone reclaim swaps pages
    Anonymous pages:
    Anonymous pages can be another large consumer of data
    Are not associated with a file, but instead contain: Program data – arrays, heap allocations, etc //打开的文件再PAGECACHE中 Anonymous memory regions Dirty memory mapped process
    private pages IPC shared memory region pages
    View summary usage grep Anon
    /proc/meminfo cat /proc/PID/statm Anonymous pages = RSS - Shared
    Anonymous pages are eligible
    for swap 匿名页不能交换到swap
  • 相关阅读:
    Cpp Chapter 12: Classes and Dynamic Memory Allocation Part1
    Cpp Chapter 11: Working with Classes Part2
    Cpp Chapter 11: Working with Classes Part1
    Cpp Chapter 10: Objects and Classes Part2
    摄影技术学习
    安装texlive2017(latex的编译软件)
    文献管理工具的使用(Mendeley和Endnote)
    函数的级数展开和渐近展开
    常用英语语法小结
    常微分方程
  • 原文地址:https://www.cnblogs.com/zengkefu/p/5576159.html
Copyright © 2011-2022 走看看