1 背景
1.1 生产环境几次所有操作出现
[java.lang.RuntimeException: java.lang.OutOfMemoryError: Failed creating thread: pthread_create() failed, maxproc limit reached] with root cause
java.lang.OutOfMemoryError: Failed creating thread: pthread_create() failed, maxproc limit reached
线程监控显示,线程数10w资源耗尽,在一个时间点,线程以每秒10个开始激增,直到搞垮系统
1.2 threaddump显示为匿名线程池和匿名线程,无法直接定位到线程池/线程定义的代码位置
$ jstack 18652
2020-12-16 12:01:32
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.221-b27 mixed mode):
"DestroyJavaVM" #12 prio=5 os_prio=0 tid=0x00000000031e2000 nid=0x47cc waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"pool-1-thread-1" #11 prio=5 os_prio=0 tid=0x000000001b11e800 nid=0x44f8 waiting on condition [0x000000001bbfe000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000078aeaed10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) thread一直在跑,只不过被linkedBlockingqueue.take 阻塞了,所以没回收thread,进而没回收threadpool
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
1.3 threadump为服务器垮掉以后的dump,意味着thread已跑完,没有堆栈
2 排查
2.1 初步
推算有人在for与while中新建线程池
2.2 排查
2.2.1 使用脚本监控进程号线程激增时自动进行threaddump,谋求其堆栈定位,只要线程在跑,没理由我们抓不到它。
2.2.2 写一个程序遍历所有代码,在for及while后的200个字符内,寻找线程池关键词,但如果在循环里面的是函数,则抓不到,而且是函数的可能性很大,该方案作为次要方案
3 结果
jstack定位到堆栈,麻痹还真有人在for里面新建线程池
4 扩展
4.1 局部线程池不会回收?
根据
这两个文章,我们认为,即使是局部线程池也不会回收,因为线程一直在跑,见 线程池的原理 中的while
我们用以下代码(不进入git)调查研究:
import java.util.concurrent.Executor; import java.util.concurrent.Executors; public class TestMain { public static void main(String []f) { Executor executor = Executors.newFixedThreadPool((4)); InnerThread innerThread = new InnerThread(); executor.execute(innerThread); } private static class InnerThread implements Runnable { @Override public void run() { } } }
main没有退出,jstack:
$ jstack 18652
2020-12-16 12:01:32
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.221-b27 mixed mode):
"DestroyJavaVM" #12 prio=5 os_prio=0 tid=0x00000000031e2000 nid=0x47cc waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"pool-1-thread-1" #11 prio=5 os_prio=0 tid=0x000000001b11e800 nid=0x44f8 waiting on condition [0x000000001bbfe000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000078aeaed10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) ***********************************************************该dump日志与 线程池的原理 中自己写的线程池一致
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
4.2 线程/线程池应务必自定义名称
4.3 线程池的shutdown
如果局部方法中,线程池shutdown了,不会出现泄露导致oom的结局
import java.util.concurrent.Executor; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; public class TestMain { public static void main(String []f) { ExecutorService executor = Executors.newFixedThreadPool((4)); InnerThread innerThread = new InnerThread(); executor.execute(innerThread); executor.shutdown(); } private static class InnerThread implements Runnable { @Override public void run() { } } }
这里就牵扯出,如何shutdown一个线程池,具体可见 线程池的原理,终止while循环的thread,让thread跑完回收,进而回收局部的线程池