问题描述:
今天一个同事反映程序有问题,让帮忙查看后台日志,发现后台日志报错的信息如下:
java.net.SocketException: Too many open files at java.net.Socket.createImpl(Socket.java:460) at java.net.Socket.connect(Socket.java:587) at org.apache.commons.net.SocketClient.connect(SocketClient.java:163) at org.apache.commons.net.SocketClient.connect(SocketClient.java:184) at com.asiainfo.goods.wo.store.scheduler.util.FtpUtil.downloadFileByFileName(FtpUtil.java:270) at com.asiainfo.goods.wo.store.scheduler.job.TargetUserJob.dealWithFtpByRequestId(TargetUserJob.java:186) at com.asiainfo.goods.wo.store.scheduler.job.TargetUserJob.execute(TargetUserJob.java:80) at com.asiainfo.goods.presale.scheduler.job.QuartzJobFactory.execute(QuartzJobFactory.java:68) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) 2018-04-08 18:45:00 [com.asiainfo.goods.wo.store.scheduler.job.TargetUserJob]-[ERROR]:216 - file doesnot exist===TARGETCUSTU00011201 804081800575321.zip
问题分析:
通过以上的错误提示可以知道,是程序打开太多的文件导致的.
解决过程:
1.查看当前系统用户下设置的打开文件的上限
[aiprd@host-10-191-5-227 log]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256705
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 20000
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
备注:当前用户下,每个进程最多可以打开65536个文件描述符.
2.查看应用程序进程当前已经打开的进程数
[aiprd@host-10-191-5-227 log]$ lsof -p 2526 | wc -l 65785
备注:当前应用程序打开的文件65785显然是超过了65536的限制,导致进程后续无法打开新的文件.
3.通过lsof命令针对单独的进程查看发现大量的deleted的文件
备注:很多文件已经不存在了,但是,文件的描述符还是打开的.
[aiprd@host-10-191-5-227 log]$ lsof -p 2526 | grep deleted | wc -l 65274
备注:deleted的文件有65274个.可见,大部分的文件描述符占用都是deleted的文件.
4.将应用程序进程关闭,释放打开的文件
[aiprd@host-10-191-5-227 log]$ kill -9 2526 [aiprd@host-10-191-5-227 log]$ lsof -p 2526 | wc -l 0
5.重启应用程序,并且查看打开的文件
[aiprd@host-10-191-5-227 log]$ ps -ef | grep scheduler_hdfs | grep -v grep | awk '{print $2}' 29639 [aiprd@host-10-191-5-227 log]$ lsof -p 29639 | wc -l 485 [aiprd@host-10-191-5-227 log]$ lsof -p 29639 | grep deleted | wc -l 0
备注:应用程序重启之后,之间打开的文件都释放掉了.后台程序可以正确的进行处理.
文档创建时间:2018年4月8日21:25:33