对于这个问题,大部分人出现在这个地方:
Client client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress("172.16.2.13", 9300));
问题在于前面初始化settings时给cluster设置了个新的名字,如:Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "tonsonmiao").build();
因为如果设置clustername后,容器会在添加transportaddress时,从集群名为tonsonmiao里查找是否有要设置的这个IP和端口,此时肯定找不到,所以会报这个错。
但是我今天又遇到这个问题,而在此之前一切正常,也就是说并不会因为代码的错误导致这个问题出现,这几天修改了下网关,可以访问这台服务器,于是我觉得可能是网络的变动,导致es出了问题,进入服务器查看:
[root@es elasticsearch-1.4.2]# ps -eaf | grep java
root 8782 8752 0 05:53 pts/1 00:00:00 grep java
root 27251 1 1 2015 ? 1-10:15:49 /usr/bin/java -Xms256m -Xmx1g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Delasticsearch -Des.path.home=/opt/elasticsearch-1.4.2 -cp :/opt/elasticsearch-1.4.2/lib/elasticsearch-1.4.2.jar:/opt/elasticsearch-1.4.2/lib/*:/opt/elasticsearch-1.4.2/lib/sigar/* org.elasticsearch.bootstrap.Elasticsearch
[root@es elasticsearch-1.4.2]# netstat -anp | grep 9300
tcp 0 0 10.18.7.97:9300 10.17.3.96:51633 SYN_RECV -
tcp 0 0 10.18.7.97:9300 10.17.3.96:51635 SYN_RECV -
tcp 0 0 :::9300 :::* LISTEN 27251/java
tcp 0 0 ::ffff:10.18.7.97:41035 ::ffff:10.18.7.97:9300 ESTABLISHED 27251/java
tcp 0 0 ::ffff:10.18.7.97:9300 ::ffff:10.18.7.97:41030 ESTABLISHED 27251/java
tcp 0 0 ::ffff:10.18.7.97:41033 ::ffff:10.18.7.97:9300 ESTABLISHED 27251/java
tcp 0 0 ::ffff:10.18.7.97:41037 ::ffff:10.18.7.97:9300 ESTABLISHED 27251/java
tcp 0 0 ::ffff:10.18.7.97:41036 ::ffff:10.18.7.97:9300 ESTABLISHED 27251/java
tcp 0 0 ::ffff:10.18.7.97:41026 ::ffff:10.18.7.97:9300 ESTABLISHED 27251/java
tcp 0 0 ::ffff:10.18.7.97:9300 ::ffff:10.18.7.97:41025 ESTABLISHED 27251/java
可以看到es监听的ip地址发生了变化,对于多网卡的情况下,es默认会绑定其中任意一张网卡,如果es中间出现问题自动修复,那么会随机修改绑定网卡,导致节点被t出集群,曾经我在某银行系统,就遇到这个问题,某几个节点不定时的被t出集群。
对于这个问题的解决,需要修改es配置:
# Set the bind address specifically (IPv4 or IPv6):
#
network.bind_host: 10.18.7.97
# Set the address other nodes will use to communicate with this node. If not
# set, it is automatically derived. It must point to an actual IP address.
#
network.publish_host: 10.18.7.97
# Set both 'bind_host' and 'publish_host':
#
network.host: 10.18.7.97
强制指明ip地址即可