1 Transparent Huge Pages 说明
官网上有2篇文章对THP 做了说明:
https://access.redhat.com/solutions/46111
Starting with RedHat6, RedHat7, OL6, OL7 SLES11 and UEK2 kernels, Transparent HugePages are implemented and enabled (default) in an attempt to improve the memory management. Transparent HugePages are similar to the HugePages that have been available in previous Linux releases. The main difference is that the Transparent HugePages are set up dynamically at run time by the khugepaged thread in kernel while the regular HugePages had to be preallocated at the boot up time.
从RedHat 6, OEL 6, SLES 11 and UEK2 kernels 开始,系统缺省会启用 Transparent HugePages,用来提高内存管理的性能透明大页(Transparent HugePages ), THP 与Hugepages 类似,主要的区别是:Transparent HugePages 可以实时配置,不需要重启才能生效配置。
关于HugePages 参考如下链接:
Linux HugePages 配置与 Oracle 性能关系说明
http://www.cndba.cn/dave/article/310
Transparent Huge Pages (THP) are enabled by default in RHEL 6 for all applications. The kernel attempts to allocate hugepages whenever possible and any Linux process will receive 2MB pages if the mmap region is 2MB naturally aligned. The main kernel address space itself is mapped with hugepages, reducing TLB pressure from kernel code. For general information on Hugepages.
The kernel will always attempt to satisfy a memory allocation using hugepages. If no hugepages are available (due to non availability of physically continuous memory for example) the kernel will fall back to the regular 4KB pages. THP are also swappable (unlike hugetlbfs). This is achieved by breaking the huge page to smaller 4KB pages, which are then swapped out normally.
But to use hugepages effectively, the kernel must find physically continuous areas of memory big enough to satisfy the request, and also properly aligned. For this, a khugepaged kernel thread has been added. This thread will occasionally attempt to substitute smaller pages being used currently with a hugepage allocation, thus maximizing THP usage.
In userland, no modifications to the applications are necessary (hence transparent). But there are ways to optimize its use. For applications that want to use hugepages, use of posix_memalign() can also help ensure that large allocations are aligned to huge page (2MB) boundaries.
Also, THP is only enabled for anonymous memory regions. There are plans to add support for tmpfs and page cache. THP tunables are found in the /sys tree under /sys/kernel/mm/redhat_transparent_hugepage.
The values for /sys/kernel/mm/redhat_transparent_hugepage/enabled can be one of the following:
always - always use THP
never - disable THP
khugepaged will be automatically started when transparent_hugepage/enabled is set to "always" or "madvise, and it'll be automatically shutdown if it's set to "never". The redhat_transparent_hugepage/defrag parameter takes the same values and it controls whether the kernel should make aggressive use of memory compaction to make more hugepages available.
2 查看与关闭THP
当以下文件里的值为always表示已经启用THP:
[root@www.cndba.cn ~]# cat /sys/kernel/mm/redhat_transparent_hugepage/enabled [always] madvise never [root@www.cndba.cn ~]# [root@www.cndba.cn ~]# grep AnonHugePages /proc/meminfo AnonHugePages: 348160 kB 只要这里的值大于0,即表示启用了THP。
在linux 6.2 之后可以通过如下命令来监控THP:
[root@www.cndba.cn ~]# egrep 'trans|thp' /proc/vmstat nr_anon_transparent_hugepages 170 thp_fault_alloc 18566 thp_fault_fallback 110 thp_collapse_alloc 185 thp_collapse_alloc_failed 32 thp_split 221 [root@www.cndba.cn ~]#
查看哪些进程在使用THP:
[root@www.cndba.cn ~]# grep -e AnonHugePages /proc/*/smaps | awk '{ if($2>4) print $0} ' | awk -F "/" '{print $0; system("ps -fp " $3)} ' /proc/2645/smaps:AnonHugePages: 2048 kB UID PID PPID C STIME TTY TIME CMD grid 2645 2644 0 Oct25 ? 00:00:09 /u01/gridsoft/12.1.0/opmn/bin/ons -d /proc/2780/smaps:AnonHugePages: 14336 kB UID PID PPID C STIME TTY TIME CMD root 2780 1 0 Oct25 ? 00:01:53 /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -Djava.util.logging.config.file=/www/t /proc/2780/smaps:AnonHugePages: 14336 kB UID PID PPID C STIME TTY TIME CMD root 2780 1 0 Oct25 ? 00:01:53 /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -Djava.util.logging.config.file=/www/t /proc/2780/smaps:AnonHugePages: 38912 kB UID PID PPID C STIME TTY TIME CMD root 2780 1 0 Oct25 ? 00:01:53 /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -Djava.util.logging.config.file=/www/t /proc/2780/smaps:AnonHugePages: 6144 kB UID PID PPID C STIME TTY TIME CMD
在OS启动时禁用THP:
在grub.conf 文件中添加:transparent_hugepage=never。 这种方法在修改后需要重启OS才能生效。
[root@www.cndba.cn ~]# cat /etc/grub.conf # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,1) # kernel /vmlinuz-version ro root=/dev/sda4 # initrd /initrd-[generic-]version.img #boot=/dev/sda1 device (hd0) HD(1,800,3e8000,ad383463-7239-443a-83c6-7b8c6539a458) default=0 timeout=5 splashimage=(hd0,1)/grub/splash.xpm.gz hiddenmenu title CentOS 6 (2.6.32-573.el6.x86_64) root (hd0,1) kernel /vmlinuz-2.6.32-573.el6.x86_64 ro root=UUID=65b6fe1a-6897-4a16-9cf6-e8dfcc89b7ce rd_NO_LUKS rd_NO_LVM.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet transparent_hugepage=never initrd /initramfs-2.6.32-573.el6.x86_64.img
在运行时禁用:
直接执行如下命令禁用THP,不需要重启OS。
[root@www.cndba.cn ~]# echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled [root@www.cndba.cn ~]# echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag [root@www.cndba.cn ~]# cat /sys/kernel/mm/redhat_transparent_hugepage/enabled always madvise [never]
但这种方法在OS重启之后就会失效。
3 Oracle 与 THP 关系
根据MOS ID 1557478.1的说明。
Transparent HugePages are known to cause unexpected node reboots and performance problems with RAC, Oracle strongly advises to disable the use of Transparent HugePages. In addition, Transparent Hugepages may cause problems even in a single-instance database environment with unexpected performance problems or delays. As such, Oracle recommends disabling Transparent HugePages on all Database servers running Oracle.
The ocssd.log may show some of the threads are blocked (but this does not show all the time):
2013-05-01 14:30:45.255: [ CSSD][224204544]clssscMonitorThreads clssnmvKillBlockThread not scheduled for 7500 msecs
2013-05-01 14:30:46.945: [ CSSD][224204544]clssscMonitorThreads clssnmvWorkerThread not scheduled for 8030 msecs
因为THP 会导致节点重启,所以Oracle 强烈建议关闭THP。 具体关闭操作参考上节。