服务器规划: 整套系统全部在rhel5u1 server 64位版本下,由基于xen的虚拟机搭建,其中集群管理节点*2、SQL节点*2、数据节点*4、Web服务节点*2组成,其中数据节点做成2个组,每组两台的形式:
- 虚拟机mysql_mgm-1, 192.168.20.5:集群管理节点,id=1
- 虚拟机mysql_mgm-2, 192.168.20.6:集群管理节点,id=2
- 虚拟机mysql_sql-1,192.168.20.7:SQL节点,mysql服务器节点,id=3
- 虚拟机mysql_sql-2,192.168.20.8:SQL节点,mysql服务器节点,id=4
- 虚拟机mysql_ndb-1:192.168.20.9:ndb数据节点,id=5
- 虚拟机mysql_ndb-2:192.168.20.10:ndb数据节点,id=6
- 虚拟机mysql_ndb-3:192.168.20.11:ndb数据节点,id=7
- 虚拟机mysql_ndb-4:192.168.20.12:ndb数据节点,id=8
- 虚拟机mysql_lb-1:192.168.20.15:LVS负载均衡节点1
- 虚拟机mysql_lb-2:192.168.20.16:LVS负载均衡节点2
- 集群的虚拟IP是:192.168.20.17
- 负载均衡使用rhel自带的软件做。
-----------------------------------------------------------------------------------------------------------
安装过程:
1. 在所有节点上安装mysql: 我没有下载mysql源代码再编译安装,感觉机器太多编译太麻烦。我去mysql.com上下载社区版本,并且是针对我的rhel 5.1 server 64位版本的mysql软件,下载地址是: http://dev.mysql.com/downloads/m ... hel5-x86-64bit-rpms 下载这下面所有rpm包并安装,其中Shared libraries和Shared compatibility libraries只能二选一安装,装好以后mysql服务、集群相关工具等都有了,很方便。
2. (1) 在所有管理 Node上建立配置文件/etc/config.ini(注:这个文件只有管理节点上需要,其它节点上是不需要的):
- [NDBD DEFAULT]
- NoOfReplicas=2
- DataMemory=600M
- IndexMemory=100M
- [NDB_MGMD]
- id=1
- hostname=192.168.20.5
- DataDir=/var/lib/mysql-cluster
- [NDB_MGMD]
- id=2
- hostname=192.168.20.6
- DataDir=/var/lib/mysql-cluster
- [MYSQLD]
- id=3
- HostName=192.168.20.7
- [MYSQLD]
- id=4
- HostName=192.168.20.8
- [NDBD]
- id=5
- HostName=192.168.20.9
- [NDBD]
- id=6
- HostName=192.168.20.10
- [NDBD]
- id=7
- HostName=192.168.20.11
- [NDBD]
- id=8
- HostName=192.168.20.12
-----------------------------------------------------------------------------------------------------------
(2) 分别开启管理node上的管理程序ndb_mgmd(本地监听1186端口),先启动id=1的节点,再启动id=2的节点,整个集群以id=2的节点为主要管理节点(谁后启动谁就是主要管理节点,关于这一点我还需要论证一下): ndb_mgmd -f /etc/config.ini 注:本server上不需要启动mysql服务(所以不需要配置/etc/my.cnf文件)。必须在在开启ndb node和sql node上的服务以前开启ndb_mgmd。
3.(1) 在所有ndb node上原有的/etc/my.cnf文件里增加以下内容:
- [mysqld]
- ndbcluster
- # IP address of the cluster management node
- ndb-connectstring=192.168.20.5
- ndb-connectstring=192.168.20.6
- [mysql_cluster]
- # IP address of the cluster management node
- ndb-connectstring=192.168.20.5
- ndb-connectstring=192.168.20.6
-----------------------------------------------------------------------------------------------------------
注:ndb节点上不需要/etc/config.ini文件。
(2) 在所有ndb node上第一次执行命令: mkdir /var/lib/mysql-cluster cd /var/lib/mysql-cluster ndbd --initial
注:ndb node上不启动mysql服务。正常情况下使用“ndbd”命令启动ndb node,只有节点发生改变或者其它情况才需要带--initial参数。
4. (1)在sql node上建立新的my.cnf文件:
- [mysqld]
- port=3306
- ndbcluster
- ndb-connectstring=192.168.20.5
- ndb-connectstring=192.168.20.6
- [ndbd]
- connect-string=192.168.20.9
- [ndbd]
- connect-string=192.168.20.10
- [ndbd]
- connect-string=192.168.20.11
- [ndbd]
- connect-string=192.168.20.12
- [ndbd_mgm]
- connect-string=192.168.20.5
- connect-string=192.168.20.6
- [ndbd_mgmd]
- config-file=/etc/config.ini
- [mysql_cluster]
- ndb-connectstring=192.168.20.5
- ndb-connectstring=192.168.20.6
-----------------------------------------------------------------------------------------------------------
注:sql node上只需要启动Mysql服务:/etc/init.d/mysql start,不需要配置/etc/my.cnf文件。
5. (1) 在所有管理 node上运行集群管理程序: ndb_mgm
在提示符这里输入命令“show”,命令的输出结果中可以看到四个ndb node全部连到管理节点,两个管理节点上的输出是一样的:
- Connected to Management Server at: 192.168.20.5:1186
- Cluster Configuration
- ---------------------
- [ndbd(NDB)] 4 node(s)
- id=5 @192.168.20.9 (Version: 5.1.22, Nodegroup: 0, Master)
- id=6 @192.168.20.10 (Version: 5.1.22, Nodegroup: 0)
- id=7 @192.168.20.11 (Version: 5.1.22, Nodegroup: 1)
- id=8 @192.168.20.12 (Version: 5.1.22, Nodegroup: 1)
- [ndb_mgmd(MGM)] 2 node(s)
- id=1 @192.168.20.5 (Version: 5.1.22)
- id=2 @192.168.20.6 (Version: 5.1.22)
- [mysqld(API)] 2 node(s)
- id=3 @192.168.20.7 (Version: 5.1.22)
- id=4 @192.168.20.8 (Version: 5.1.22)
可以看到,所有节点都已连接到管理节点上。如果出现“not connected, accepting connect from any host)”,表示某个节点还没有连到管理节点上。
从netstat命令的输出结果也可以看出所有节点都连到管理节点上:
- tcp 0 0 192.168.20.5:1186 192.168.20.7:48066 ESTABLISHED
- tcp 0 0 192.168.20.5:1186 192.168.20.7:48065 ESTABLISHED
- tcp 0 0 192.168.20.5:1186 192.168.20.12:48677 ESTABLISHED
- tcp 0 0 192.168.20.5:1186 192.168.20.9:37060 ESTABLISHED
- tcp 0 0 192.168.20.5:1186 192.168.20.9:37061 ESTABLISHED
- tcp 0 0 192.168.20.5:1186 192.168.20.9:37062 ESTABLISHED
- tcp 0 0 192.168.20.5:1186 192.168.20.9:50631 ESTABLISHED
- tcp 0 0 192.168.20.5:1186 192.168.20.11:33977 ESTABLISHED
- tcp 0 0 192.168.20.5:1186 192.168.20.10:55260 ESTABLISHED
(2) 在任一ndb node上看连接(以20.9为例),都可以看到本node和其他3个ndb node(10/11/12)、管理node(5/6)、sql node(7/8)都有连接,除了和管理node的连接是到1186端口,其它连接都是随机端口。
- tcp 0 0 192.168.20.9:59318 192.168.20.11:49124 ESTABLISHED
- tcp 0 0 192.168.20.9:37593 192.168.20.7:33593 ESTABLISHED
- tcp 0 0 192.168.20.9:55146 192.168.20.10:46643 ESTABLISHED
- tcp 0 0 192.168.20.9:48657 192.168.20.12:46097 ESTABLISHED
- tcp 0 0 192.168.20.9:55780 192.168.20.8:41428 ESTABLISHED
- tcp 0 0 192.168.20.9:58185 192.168.20.5:1186 ESTABLISHED
- tcp 0 0 192.168.20.9:54535 192.168.20.6:1186 ESTABLISHED
(3) 在任一sql node上看连接(以20.7为例),可以看到两个sql node都连接到管理node 20.6上(管理node中20.5先启动,20.6后启动):
- tcp 0 0 192.168.20.7:49726 192.168.20.6:1186 ESTABLISHED
- tcp 0 0 192.168.20.7:38498 192.168.20.10:58390 ESTABLISHED
- tcp 0 0 192.168.20.7:54636 192.168.20.12:40206 ESTABLISHED
- tcp 0 0 192.168.20.7:33593 192.168.20.9:37593 ESTABLISHED
- tcp 0 0 192.168.20.7:57676 192.168.20.11:37717 ESTABLISHED
7. mysql高可用性集群搭建完成,接下来用ipvs搭建负载均衡。
在所有mysql_sql节点上建立空库: create database loadbalancing;
设置权限,允许所有mysql_lb节点有select权限(用于心跳测试): grant select on loadbalancing.* to loadbalancing@192.168.20.15 identified by 'abcdefg'; grant select on loadbalancing.* to loadbalancing@192.168.20.16 identified by 'abcdefg';
8.(1) 在管理节点上加载IPVS模块:
- modprobe ip_vs_dh
- modprobe ip_vs_ftp
- modprobe ip_vs
- modprobe ip_vs_lblc
- modprobe ip_vs_lblcr
- modprobe ip_vs_lc
- modprobe ip_vs_nq
- modprobe ip_vs_rr
- modprobe ip_vs_sed
- modprobe ip_vs_sh
- modprobe ip_vs_wlc
- modprobe ip_vs_wrr
(2) 在管理节点上配置LVS(20.15和20.16是两个负载均衡的节点,realserver是20.7和20.8,虚拟IP是20.17,端口是 3306)。可以启动/etc/init.d/piranha-gui,然后在http://localhost:3636里设置集群,最终生成配置文件 /etc/sysconfig/ha/lvs.cf,也可以直接生成这个文件/etc/sysconfig/ha/lvs.cf:
- serial_no = 37
- primary = 192.168.20.15
- service = lvs
- backup_active = 1
- backup = 192.168.20.16
- heartbeat = 1
- heartbeat_port = 539
- keepalive = 6
- deadtime = 18
- network = direct
- debug_level = NONE
- monitor_links = 1
- virtual MySql {
- active = 1
- address = 192.168.20.17 eth0:1
- vip_nmask = 255.255.255.0
- port = 3306
- expect = "OK"
- use_regex = 0
- send_program = "/usr/local/bin/mysql_running_test %h"
- load_monitor = none
- scheduler = wlc
- protocol = tcp
- timeout = 6
- reentry = 15
- quiesce_server = 0
- server mysql_sql-1 {
- address = 192.168.20.7
- active = 1
- weight = 1
- }
- server mysql_sql-2 {
- address = 192.168.20.8
- active = 1
- weight = 1
- }
- }
必须确保lvs.cf文件在两个负载均衡节点上都有,并且内容完全相同。
两个负载均衡节点上的探测脚本/usr/local/bin/mysql_running_test:
- #!/bin/sh
- # We use $1 as the argument in the TEST which will be the various IP's
- # of the real servers in the cluster.
- # Check for mysql service
- TEST=`echo 'select "" as abcdefg' | mysql -uloadbalancing -pabcdefg -h $1 | grep abcdefg`
- if [ $TEST != '1' ]; then
- echo "OK"
- else
- echo "FAIL"
- # /bin/echo | mail [email]pager@failure.company.com[/email] -s "NOTICE: $1 failed to provide email service"
- fi
注:两个探测节点上需要装mysql client。 原理说明:lvs.cf里指定这个脚本,其实是给负载均衡节点上的nanny程序调用,lvs.cf里的%h参数表示调用这个脚本的时候加上hostname/ip地址的参数。 这个脚本的作用是连接到mysql server上执行select语句回显一个字符串abcdefg,通过判断回显是否正确来确认real server是否运行正常。
(3) 启动LVS服务: /etc/init.d/pulse start
其中一个节点上的/var/log/messages里的内容:
- Dec 27 14:57:15 mysql_lb-1 pulse[8606]: STARTING PULSE AS MASTER
- Dec 27 14:59:29 mysql_lb-1 pulse[8606]: Terminating due to signal 15
- Dec 27 14:59:30 mysql_lb-1 pulse: SIOCGIFADDR failed: Cannot assign requested address
- Dec 27 14:59:30 mysql_lb-1 pulse[8659]: STARTING PULSE AS MASTER
另一节点上/var/log/messages里的内容:
- Dec 27 14:59:06 mysql_lb-2 pulse[16729]: STARTING PULSE AS BACKUP
- Dec 27 14:59:08 mysql_lb-2 pulse[16729]: primary inactive (link failure?): activating lvs
- Dec 27 14:59:08 mysql_lb-2 lvs[16731]: starting virtual service MySql active: 3306
- Dec 27 14:59:08 mysql_lb-2 nanny[16734]: starting LVS client monitor for 192.168.20.17:3306
- Dec 27 14:59:08 mysql_lb-2 lvs[16731]: create_monitor for MySql/mysql_sql-1 running as pid 16734
- Dec 27 14:59:08 mysql_lb-2 nanny[16737]: starting LVS client monitor for 192.168.20.17:3306
- Dec 27 14:59:08 mysql_lb-2 lvs[16731]: create_monitor for MySql/mysql_sql-2 running as pid 16737
- Dec 27 14:59:08 mysql_lb-2 nanny[16737]: making 192.168.20.8:3306 available
- Dec 27 14:59:08 mysql_lb-2 nanny[16734]: making 192.168.20.7:3306 available
- Dec 27 14:59:13 mysql_lb-2 pulse[16740]: gratuitous lvs arps finished