事件
lxc宿主机10.11.164.28上所有mongodb数据节点,在同一时刻报错:failed to create thread after accepting new connection, closing connection
宿主机版本:oracle linux 6.5,lxc版本:1.0.6
数据库版本:mongodb 3.2.11
报错信息:
2017-07-26T10:23:18.734+0800 I NETWORK [initandlisten] failed to create thread after accepting new connection, closing connection 2017-07-26T10:23:23.781+0800 I NETWORK [initandlisten] connection accepted from 127.0.0.1:44334 #5874 (14 connections now open) 2017-07-26T10:23:23.781+0800 I NETWORK [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable 2017-07-26T10:23:23.781+0800 I NETWORK [initandlisten] failed to create thread after accepting new connection, closing connection 2017-07-26T10:23:26.670+0800 I NETWORK [initandlisten] connection accepted from 192.168.4.206:54601 #5875 (14 connections now open) 2017-07-26T10:23:26.670+0800 I NETWORK [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
问题确认
报错信息所在文件:
./mongodb-src-r3.2.16/src/mongo/util/net//message_server_port.cpp: log() << "failed to create thread after accepting new connection, closing connection";
./mongodb-src-r3.2.16/src/mongo/util/net//message_server_port.cpp: log() << "pthread_create failed: " << errnoWithDescription(failed) << endl;
virtual void accepted(std::shared_ptr<Socket> psocket, long long connectionId) { ScopeGuard sleepAfterClosingPort = MakeGuard(sleepmillis, 2); std::unique_ptr<MessagingPortWithHandler> portWithHandler( new MessagingPortWithHandler(psocket, _handler, connectionId)); if (!Listener::globalTicketHolder.tryAcquire()) { log() << "connection refused because too many open connections: " << Listener::globalTicketHolder.used() << endl; return; } try { #ifndef __linux__ // TODO: consider making this ifdef _WIN32 { stdx::thread thr(stdx::bind(&handleIncomingMsg, portWithHandler.get())); thr.detach(); } #else pthread_attr_t attrs; //声明pthread_attr_t对象attrs pthread_attr_init(&attrs); //初始化attrs pthread_attr_setdetachstate(&attrs, PTHREAD_CREATE_DETACHED); //设置线程attrs状态为PTHREAD_CREATE_DETACHED,退出时自行释放所占用的资源 static const size_t STACK_SIZE = //设置静态常量stack_size,数据类型为size_t,正整数 1024 * 1024; // if we change this we need to update the warning struct rlimit limits; //声明rlimit类型的结构体limits,详细内容在下文解释 verify(getrlimit(RLIMIT_STACK, &limits) == 0); //验证,RLIMIT_STACK(最大的进程堆栈)和limits比较,如果 if (limits.rlim_cur > STACK_SIZE) { //如果需要的stack大小大于建议设置的stack大小比较,则分配建议的stack_size(1M) size_t stackSizeToSet = STACK_SIZE; #if !__has_feature(address_sanitizer) if (kDebugBuild) // stackSizeToSet /= 2; #endif pthread_attr_setstacksize(&attrs, stackSizeToSet); //为线程attrs分配堆栈大小,大小为stackSizeToSet } else if (limits.rlim_cur < 1024 * 1024) { //如果需要的limit值小于1M,则warning warning() << "Stack size set to " << (limits.rlim_cur / 1024) << "KB. We suggest 1MB" << endl; } pthread_t thread; //声明进程 int failed = pthread_create(&thread, &attrs, &handleIncomingMsg, portWithHandler.get()); //创建进程(进程号,属性,其实函数地址等,启动变量等),fail值,成功为0,失败-1 pthread_attr_destroy(&attrs); //释放占用的sttrs资源 if (failed) { //创建失败,日志打印, log() << "pthread_create failed: " << errnoWithDescription(failed) << endl; //errnoWithDescription(failed)在这里为Resource temporarily unavailable throw std::system_error( std::make_error_code(std::errc::resource_unavailable_try_again)); } #endif // __linux__ portWithHandler.release(); //释放定制的函数指针 sleepAfterClosingPort.Dismiss(); // } catch (...) { //抛出异常,释放监听进程等 Listener::globalTicketHolder.release(); log() << "failed to create thread after accepting new connection, closing connection"; } }
其中limits是rlimit类型的结构体,定义如下,rlimit是linux系统的结构体,定义一个进程在运行过程中能得到的最大进程,针对soft limit(软限制)或者hard limit(硬限制)
struct rlimit { rlim_t rlim_cur; //soft limit rlim_t rlim_max; //hard limit };
有两种函数控制:
int getrlimit(int resource, struct rlimit *rlim); //查询进程是否满足一个进程的rlimit
int setrlimit(int resource, const struct rlimit *rlim);
报错含义是:不能创建新的进程,来映射新链接
确认原因
lxc虚机与stack相关的参数信息如下:
ulimit -s stack size The maximum stack size cat /etc/security/limits.conf mongodb soft nproc 4096 max number of processes mongodb hard nproc 16384 max number of processes mongodb soft nofile 131072 max number of open file descriptors mongodb hard nofile 131072 max number of open file descriptors mongodb soft stack 1024 max stack size (KB) mongodb hard stack 1024 max stack size (KB)
宿主机与stack相关的参数表
stack size (kbytes, -s) 8192 # End of file * soft nproc 65536 * hard nproc 65536 * soft nofile 131072 * hard nofile 131072
尝试修改虚机的nproc限制,并不起作用。
除此,在linux 6x版本中,还引入了一个配置cat /etc/security/limits.d/90-nproc.conf
宿主机的配置:
# cat /etc/security/limits.d/90-nproc.conf # Default limit for number of user's processes to prevent # accidental fork bombs. # See rhbz #432903 for reasoning. * soft nproc 65536 root soft nproc unlimited
虚机的配置:
#/etc/security/limits.d/90-nproc.conf # Default limit for number of user's processes to prevent # accidental fork bombs. # See rhbz #432903 for reasoning. * soft nproc 1024 root soft nproc unlimited
尝试扩大虚机对*用户的nproc软限制,改为10240,mongodb创建链接恢复正常
结论,创建thread受限两个配置,/etc/security/limit.conf和/etc/security/limits.d/90-nproc.conf,当然也受限与宿主机的配置。