zoukankan      html  css  js  c++  java
  • postgresql 发生 oom 的分析之二 cgroup

    os:centos 7.4
    postgresql:10.4

    接上一篇blog:postgresql 发生 oom 的分析之一

    本 blog 使用 cgroup 控制 os 的 memory,先简单介绍下cgroups:

    CGroups 是一种对进程资源管理和控制的统一框架,它提供的是一种机制,而具体的策略(Policy)是通过子系统(subsystem)来完成的。

    机制和策略是Linux操作系统中一种经典的设计思想,所谓机制就是“我要提供哪种功能”,而策略则是“我要怎样来实现这种功能”。

    子系统是CGroups对进程组进行资源控制的具体行为。

    安装

    # yum install libcgroup libcgroup-devel libcgroup-tools
    
    # systemctl status cgconfig.service 
    # cat /usr/lib/systemd/system/cgconfig.service
    
    [Unit]
    Description=Control Group configuration service
    
    # The service should be able to start as soon as possible,
    # before any 'normal' services:
    DefaultDependencies=no
    Conflicts=shutdown.target
    Before=basic.target shutdown.target
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    Delegate=yes
    ExecStart=/usr/sbin/cgconfigparser -l /etc/cgconfig.conf -L /etc/cgconfig.d -s 1664
    ExecStop=/usr/sbin/cgclear -l /etc/cgconfig.conf -L /etc/cgconfig.d -e
    
    [Install]
    WantedBy=sysinit.target
    # systemctl start cgconfig.service 
    
    # ls -l /etc |grep -i cg
    -rw-r--r--   1 root    root       676 Apr 11 10:33 cgconfig.conf
    drwxr-xr-x   2 root    root         6 Apr 11 10:33 cgconfig.d
    -rw-r--r--   1 root    root       234 Apr 11 10:33 cgrules.conf
    -rw-r--r--   1 root    root       131 Apr 11 10:33 cgsnapshot_blacklist.conf
    
    # df -hT
    Filesystem              Type      Size  Used Avail Use% Mounted on
    /dev/mapper/centos-root xfs        47G   23G   25G  48% /
    devtmpfs                devtmpfs  2.0G     0  2.0G   0% /dev
    tmpfs                   tmpfs     2.0G  8.0K  2.0G   1% /dev/shm
    tmpfs                   tmpfs     2.0G  8.9M  2.0G   1% /run
    tmpfs                   tmpfs     2.0G     0  2.0G   0% /sys/fs/cgroup
    /dev/sda1               xfs      1014M  160M  855M  16% /boot
    tmpfs                   tmpfs     396M  4.0K  396M   1% /run/user/42
    tmpfs                   tmpfs     396M   28K  396M   1% /run/user/0

    注意
    tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup

    配置

    cgroups 可以控制的资源

    # cat /proc/cgroups 
    
    #subsys_name    hierarchy   num_cgroups enabled
    cpuset  4   1   1
    cpu 3   84  1
    cpuacct 3   84  1
    memory  6   85  1
    devices 7   84  1
    freezer 2   1   1
    net_cls 10  1   1
    blkio   11  84  1
    perf_event  5   1   1
    hugetlb 8   1   1
    pids    9   1   1
    net_prio    10  1   1
    
    # lssubsys -am
    cpuset /sys/fs/cgroup/cpuset
    cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
    memory /sys/fs/cgroup/memory
    devices /sys/fs/cgroup/devices
    freezer /sys/fs/cgroup/freezer
    net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio
    blkio /sys/fs/cgroup/blkio
    perf_event /sys/fs/cgroup/perf_event
    hugetlb /sys/fs/cgroup/hugetlb
    pids /sys/fs/cgroup/pids

    本次配置控制内存

    # cd /etc/cgconfig.d
    # vi postgresql.conf
    
    group postgresql {
        perm {
              task{
                  uid=postgres;
                  gid=postgres;
              }
    
              admin{
                 uid=root;
                 gid=root; 
              }
    
        } memory {
           memory.limit_in_bytes=2000M;
        }
    }
    
    # vi /etc/cgrules.conf
    
    postgres      memory           postgresql/
    # systemctl stop cgconfig.service 
    # systemctl start cgconfig.service 
    # free -m
                  total        used        free      shared  buff/cache   available
    Mem:           3951         656        2454           9         841        3203
    Swap:          2047           0        2047
    

    启动postgresql

    # su - postgres
    $ /usr/pgsql-10/bin/pg_ctl start -D /var/lib/pgsql/10/data
    
    # free -m
                  total        used        free      shared  buff/cache   available
    Mem:           3951         658        2433          24         860        3185
    Swap:          2047           0        2047
    

    插入大数据

    postgres=# insert into test01 values(generate_series(1,5000000),repeat( chr(int4(random()*26)+65),1000));
    
    
    

    os层面,该进程的内存使用率一致维持在 50% 左右

    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    postgres 18612  4.0 49.1 2292316 935160 ?      Ds   06:10   0:14 postgres: peiyb: postgres postgres [local] INSERT
    
    

    没坚持多久,进程也被杀掉了,查看os日志

    # dmesg
    [26191.627741] Task in /postgresql killed as a result of limit of /postgresql
    [26191.627743] memory: usage 2048000kB, limit 2048000kB, failcnt 2126672
    [26191.627744] memory+swap: usage 4136808kB, limit 9007199254740988kB, failcnt 0
    [26191.627745] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
    [26191.627746] Memory cgroup stats for /postgresql: cache:258152KB rss:1789848KB rss_huge:0KB mapped_file:247288KB swap:2088808KB inactive_anon:561100KB active_anon:1486740KB inactive_file:0KB active_file:0KB unevictable:0KB
    [26191.627753] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
    [26191.627806] [18551]  1002 18551   113848     3607      50      217             0 postgres
    [26191.627808] [18559]  1002 18559    42656      141      32      243             0 postgres
    [26191.627811] [24714]  1002 24714    29155      463      14      346             0 bash
    [26191.627812] [24956]  1002 24956   113951    44739     169      220             0 postgres
    [26191.627814] [24957]  1002 24957   113887    36498     165      211             0 postgres
    [26191.627815] [24958]  1002 24958   113848     2228      38      218             0 postgres
    [26191.627817] [24959]  1002 24959   113983      395      39      242             0 postgres
    [26191.627818] [24960]  1002 24960    43221      199      33      202             0 postgres
    [26191.627820] [24961]  1002 24961   113955      216      36      288             0 postgres
    [26191.627821] [24973]  1002 24973    39182      692      31       40             0 psql
    [26191.627823] [24974]  1002 24974   532267   272088     989   193541             0 postgres
    [26191.627824] [24997]  1002 24997    39182      543      31      198             0 psql
    [26191.627826] [24998]  1002 24998   661215   275338    1240   324846             0 postgres
    [26191.627827] Memory cgroup out of memory: Kill process 24998 (postgres) score 580 or sacrifice child
    [26191.627830] Killed process 24998 (postgres) total-vm:2644860kB, anon-rss:885472kB, file-rss:2572kB, shmem-rss:213308kB
    
    

    可以看到 Memory cgroup out of memory

    cgroup 可以用来限定一类或一个进程的资源达到什么程度时就杀掉进程,此时 os 可能还有不少空闲内存。
    而 os oom killer 出现时,意味着 os 几乎没有可用的内存了。

    参考:
    https://www.cnblogs.com/easton-wang/p/7656205.html
    https://www.cnblogs.com/doscho/p/6041036.html
    http://www.oracle.com/technetwork/articles/servers-storage-admin/resource-controllers-linux-1506602.html

  • 相关阅读:
    idea 没有 persistence
    java 枚举(二) 级联关系
    java to edi 动态/静态映射
    edi to java
    C# 扩展方法
    最详细的C++对应C#的数据类型转换
    c# .Net随机生成字符串代码
    遍历结构体内部元素和值(Name and Value)
    寒假学习计划
    python os.path模块
  • 原文地址:https://www.cnblogs.com/ctypyb2002/p/9792897.html
Copyright © 2011-2022 走看看