zoukankan      html  css  js  c++  java
  • postgresql 发生 oom 的分析之二 cgroup

    os:centos 7.4
    postgresql:10.4

    接上一篇blog:postgresql 发生 oom 的分析之一

    本 blog 使用 cgroup 控制 os 的 memory,先简单介绍下cgroups:

    CGroups 是一种对进程资源管理和控制的统一框架,它提供的是一种机制,而具体的策略(Policy)是通过子系统(subsystem)来完成的。

    机制和策略是Linux操作系统中一种经典的设计思想,所谓机制就是“我要提供哪种功能”,而策略则是“我要怎样来实现这种功能”。

    子系统是CGroups对进程组进行资源控制的具体行为。

    安装

    # yum install libcgroup libcgroup-devel libcgroup-tools
    
    # systemctl status cgconfig.service 
    # cat /usr/lib/systemd/system/cgconfig.service
    
    [Unit]
    Description=Control Group configuration service
    
    # The service should be able to start as soon as possible,
    # before any 'normal' services:
    DefaultDependencies=no
    Conflicts=shutdown.target
    Before=basic.target shutdown.target
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    Delegate=yes
    ExecStart=/usr/sbin/cgconfigparser -l /etc/cgconfig.conf -L /etc/cgconfig.d -s 1664
    ExecStop=/usr/sbin/cgclear -l /etc/cgconfig.conf -L /etc/cgconfig.d -e
    
    [Install]
    WantedBy=sysinit.target
    # systemctl start cgconfig.service 
    
    # ls -l /etc |grep -i cg
    -rw-r--r--   1 root    root       676 Apr 11 10:33 cgconfig.conf
    drwxr-xr-x   2 root    root         6 Apr 11 10:33 cgconfig.d
    -rw-r--r--   1 root    root       234 Apr 11 10:33 cgrules.conf
    -rw-r--r--   1 root    root       131 Apr 11 10:33 cgsnapshot_blacklist.conf
    
    # df -hT
    Filesystem              Type      Size  Used Avail Use% Mounted on
    /dev/mapper/centos-root xfs        47G   23G   25G  48% /
    devtmpfs                devtmpfs  2.0G     0  2.0G   0% /dev
    tmpfs                   tmpfs     2.0G  8.0K  2.0G   1% /dev/shm
    tmpfs                   tmpfs     2.0G  8.9M  2.0G   1% /run
    tmpfs                   tmpfs     2.0G     0  2.0G   0% /sys/fs/cgroup
    /dev/sda1               xfs      1014M  160M  855M  16% /boot
    tmpfs                   tmpfs     396M  4.0K  396M   1% /run/user/42
    tmpfs                   tmpfs     396M   28K  396M   1% /run/user/0

    注意
    tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup

    配置

    cgroups 可以控制的资源

    # cat /proc/cgroups 
    
    #subsys_name    hierarchy   num_cgroups enabled
    cpuset  4   1   1
    cpu 3   84  1
    cpuacct 3   84  1
    memory  6   85  1
    devices 7   84  1
    freezer 2   1   1
    net_cls 10  1   1
    blkio   11  84  1
    perf_event  5   1   1
    hugetlb 8   1   1
    pids    9   1   1
    net_prio    10  1   1
    
    # lssubsys -am
    cpuset /sys/fs/cgroup/cpuset
    cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
    memory /sys/fs/cgroup/memory
    devices /sys/fs/cgroup/devices
    freezer /sys/fs/cgroup/freezer
    net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio
    blkio /sys/fs/cgroup/blkio
    perf_event /sys/fs/cgroup/perf_event
    hugetlb /sys/fs/cgroup/hugetlb
    pids /sys/fs/cgroup/pids

    本次配置控制内存

    # cd /etc/cgconfig.d
    # vi postgresql.conf
    
    group postgresql {
        perm {
              task{
                  uid=postgres;
                  gid=postgres;
              }
    
              admin{
                 uid=root;
                 gid=root; 
              }
    
        } memory {
           memory.limit_in_bytes=2000M;
        }
    }
    
    # vi /etc/cgrules.conf
    
    postgres      memory           postgresql/
    # systemctl stop cgconfig.service 
    # systemctl start cgconfig.service 
    # free -m
                  total        used        free      shared  buff/cache   available
    Mem:           3951         656        2454           9         841        3203
    Swap:          2047           0        2047
    

    启动postgresql

    # su - postgres
    $ /usr/pgsql-10/bin/pg_ctl start -D /var/lib/pgsql/10/data
    
    # free -m
                  total        used        free      shared  buff/cache   available
    Mem:           3951         658        2433          24         860        3185
    Swap:          2047           0        2047
    

    插入大数据

    postgres=# insert into test01 values(generate_series(1,5000000),repeat( chr(int4(random()*26)+65),1000));
    
    
    

    os层面,该进程的内存使用率一致维持在 50% 左右

    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    postgres 18612  4.0 49.1 2292316 935160 ?      Ds   06:10   0:14 postgres: peiyb: postgres postgres [local] INSERT
    
    

    没坚持多久,进程也被杀掉了,查看os日志

    # dmesg
    [26191.627741] Task in /postgresql killed as a result of limit of /postgresql
    [26191.627743] memory: usage 2048000kB, limit 2048000kB, failcnt 2126672
    [26191.627744] memory+swap: usage 4136808kB, limit 9007199254740988kB, failcnt 0
    [26191.627745] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
    [26191.627746] Memory cgroup stats for /postgresql: cache:258152KB rss:1789848KB rss_huge:0KB mapped_file:247288KB swap:2088808KB inactive_anon:561100KB active_anon:1486740KB inactive_file:0KB active_file:0KB unevictable:0KB
    [26191.627753] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
    [26191.627806] [18551]  1002 18551   113848     3607      50      217             0 postgres
    [26191.627808] [18559]  1002 18559    42656      141      32      243             0 postgres
    [26191.627811] [24714]  1002 24714    29155      463      14      346             0 bash
    [26191.627812] [24956]  1002 24956   113951    44739     169      220             0 postgres
    [26191.627814] [24957]  1002 24957   113887    36498     165      211             0 postgres
    [26191.627815] [24958]  1002 24958   113848     2228      38      218             0 postgres
    [26191.627817] [24959]  1002 24959   113983      395      39      242             0 postgres
    [26191.627818] [24960]  1002 24960    43221      199      33      202             0 postgres
    [26191.627820] [24961]  1002 24961   113955      216      36      288             0 postgres
    [26191.627821] [24973]  1002 24973    39182      692      31       40             0 psql
    [26191.627823] [24974]  1002 24974   532267   272088     989   193541             0 postgres
    [26191.627824] [24997]  1002 24997    39182      543      31      198             0 psql
    [26191.627826] [24998]  1002 24998   661215   275338    1240   324846             0 postgres
    [26191.627827] Memory cgroup out of memory: Kill process 24998 (postgres) score 580 or sacrifice child
    [26191.627830] Killed process 24998 (postgres) total-vm:2644860kB, anon-rss:885472kB, file-rss:2572kB, shmem-rss:213308kB
    
    

    可以看到 Memory cgroup out of memory

    cgroup 可以用来限定一类或一个进程的资源达到什么程度时就杀掉进程,此时 os 可能还有不少空闲内存。
    而 os oom killer 出现时,意味着 os 几乎没有可用的内存了。

    参考:
    https://www.cnblogs.com/easton-wang/p/7656205.html
    https://www.cnblogs.com/doscho/p/6041036.html
    http://www.oracle.com/technetwork/articles/servers-storage-admin/resource-controllers-linux-1506602.html

  • 相关阅读:
    some tips
    ORA00847: MEMORY_TARGET/MEMORY_MAX_TARGET and LOCK_SGA cannot be set together
    Chapter 01Overview of Oracle 9i Database Perfomrmance Tuning
    Chapter 02Diagnostic and Tuning Tools
    变量与常用符号
    Chapter 18Tuning the Operating System
    标准输入输出
    Trace files
    DBADeveloped Tools
    Chapter 03Database Configuration and IO Issues
  • 原文地址:https://www.cnblogs.com/ctypyb2002/p/9792897.html
Copyright © 2011-2022 走看看