机房搬迁,最近集群老是丢失datanode。
查找原因:
(1)用户进程可打开文件数限制
http://www.54chen.com/java-ee/hive-hadoop-blockalreadyexistsexception.html
在/etc/security/limits.conf 添加:
hadoop - nofile 65535
hadoop - nproc 65535
然后用root:ulimit -SHn 65535
(2)网络连接限制修改
http://zbszone.iteye.com/blog/826199
修改网络连接数限制,在/etc/sysctl.conf中添加:
#net.ipv4.tcp_fin_timeout = 30 #net.ipv4.tcp_keepalive_time = 120 net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.ip_conntrack_max = 655360 net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 180 使配置立即生效: /sbin/sysctl -p 如果出现error: "net.ipv4.ip_conntrack_max" is an unknown key error: "net.ipv4.netfilter.ip_conntrack_tcp_timeout_established" is an unknown key 解决: modprobe ip_conntrack
echo "modprobe ip_conntrack" >> /etc/rc.local