为了测试一个环境,需要在Azure上搭建高可用的LAMP架构。但要求MySQL的中间件Atlas采用主备的模式。在数据中心一般采用Keepalive+VIP的模式,通过浮动地址对外提供服务。
但在云环境中,不能支持浮动地址,也不支持A/S的负载均衡模式。于是考虑采用ILB+HAProxy的模式,由ILB模拟VIP地址,HAProxy负责A/S的负载均衡。
-
整体框架
采用全冗余的架构设计。每层都是双机。主机都是采用CentOS6.5的操作系统。
-
MySQL的安装与配置
MySQL采用主主的配置方式。两台设备上都敲入下面的配置和命令:yum install -y mysql-server
chekconfig mysqld on;service mysqld start
iptables -F
setenforce 0
service iptables save更改root密码:
/usr/bin/mysqladmin -u root password "newpass"使用root登陆
mysql -h127.0.0.1 -uroot -ppassword创建数据库:
create database mytable;创建用户,两台创建相同的用户:
GRANT ALL ON php.* to 'user'@'%' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;尝试创建表和插入数据,两台服务器插入不同的内容:
use mytable;
create table mytest(name varchar(20), phone char(14));
insert into mytest(name, phone) values('wang', 11111111111);
select * from mytest;配置主-主:
配置/etc/my.cnf文件:
主机一 |
主机二 |
server-id = 1 |
server-id = 2 |
log_bin=mysqlbinlog |
log_bin=mysqlbinlog |
log_bin_index=mysqlbinlog-index |
log_bin_index=mysqlbinlog-index |
log_slave_updates=1 |
log_slave_updates=1 |
relay_log=relay-log |
relay_log=relay-log |
replicate_do_db=test |
replicate_do_db=test |
binlog-do-db = test |
binlog-do-db = test |
binlog-ignore-db=mysql |
binlog-ignore-db=mysql |
log-slave-updates |
log-slave-updates |
sync_binlog=1 |
sync_binlog=1 |
auto_increment_offset=1 |
auto_increment_offset=2 |
auto_increment_increment=2 |
auto_increment_increment=2 |
replicate-ignore-db= mysql |
replicate-ignore-db= mysql |
配置完成后,重新启动mysql: service mysqld restart
在两台主机中观察:
show master status;
File |
Position |
Binlog_Do_DB |
Binlog_Ignore_DB |
mysqlbinlog.000001 |
325 |
test |
mysql |
show slave statusG
此时Slave_IO_Running、Slave_SQL_Running状态是No的状态:
mysql> show slave statusG
***************** 1. row *****************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.4.5
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysqlbinlog.000001
Read_Master_Log_Pos: xxxx
Relay_Log_File: relay-log.0000xx
Relay_Log_Pos: 253
Relay_Master_Log_File: mysqlbinlog.000001
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB: test
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: xxxx
Relay_Log_Space: xxx
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
在两台机器上创建复制用户:
GRANT REPLICATION SLAVE ON . TO 'slave'@'%' IDENTIFIED BY 'password';
执行下面的命令实现主-主:
172.16.4.4上执行:
stop slave;
CHANGE MASTER TO MASTER_HOST='172.16.4.5', MASTER_USER='slave', MASTER_PASSWORD="password, MASTER_LOG_FILE='mysqlbinlog.000001', MASTER_LOG_POS=325;
start slvae;
在172.16.4.5上执行:
stop slave;
CHANGE MASTER TO MASTER_HOST='172.16.4.4', MASTER_USER='slave', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysqlbinlog.000001', MASTER_LOG_POS=325;
start slave;
此时show slave statusG中的Slave_IO_Running、Slave_SQL_Running状态是yes、yes状态。此时主-主就做成功了。
3 安装Atlas
两台安装配置相同:
从Github上下载Atlas:
https://github.com/Qihoo360/Atlas/releases
选择相应的版本,我的机器是CentOS6.5,所以我选择Atlas-2.2.1.el6.x86_64.rpm。
wget https://github.com/Qihoo360/Atlas/releases/download/2.2.1/Atlas-2.2.1.el6.x86_64.rpm
发现是存放在AWS的S3上。
安装:rpm -ivh Atlas-2.2.1.el6.x86_64.rpm
修改配置文件: /usr/local/mysql-proxy/conf/test.cnf
#带#号的为非必需的配置项目
#管理接口的用户名
admin-username = user
#管理接口的密码
admin-password = pwd
#Atlas后端连接的MySQL主库的IP和端口,可设置多项,用逗号分隔
proxy-backend-addresses = 172.16.4.4:3306
#Atlas后端连接的MySQL从库的IP和端口,@后面的数字代表权重,用来作负载均衡,若省略则默认为1,可设置多项,用逗号分隔
proxy-read-only-backend-addresses = 172.16.4.5:3306@1
#用户名与其对应的加密过的MySQL密码,密码使用PREFIX/bin目录下的加密程序encrypt加密,下行的user1和user2为示例,将其替换为你的MySQL的用户名和加密密码!
pwds = slave:euRQ8nFxoVUtoVZBPiOC6Q==
#设置Atlas的运行方式,设为true时为守护进程方式,设为false时为前台方式,一般开发调试时设为false,线上运行时设为true,true后面不能有空格。
daemon = true
#设置Atlas的运行方式,设为true时Atlas会启动两个进程,一个为monitor,一个为worker,monitor在worker意外退出后会自动将其重启,设为false时只有worker,没有monitor,一般开发调试时设为false,线上运行时设为true,true后面不能有空格。
keepalive = false
#工作线程数,对Atlas的性能有很大影响,可根据情况适当设置
event-threads = 1
#日志级别,分为message、warning、critical、error、debug五个级别
log-level = message
#日志存放的路径
log-path = /usr/local/mysql-proxy/log
#SQL日志的开关,可设置为OFF、ON、REALTIME,OFF代表不记录SQL日志,ON代表记录SQL日志,REALTIME代表记录SQL日志且实时写入磁盘,默认为OFF
#sql-log = OFF
#慢日志输出设置。当设置了该参数时,则日志只输出执行时间超过sql-log-slow(单位:ms)的日志记录。不设置该参数则输出全部日志。
#sql-log-slow = 10
#实例名称,用于同一台机器上多个Atlas实例间的区分
#instance = test
#Atlas监听的工作接口IP和端口
proxy-address = 0.0.0.0:3306
#Atlas监听的管理接口IP和端口
admin-address = 0.0.0.0:2345
#分表设置,此例中person为库名,mt为表名,id为分表字段,3为子表数量,可设置多项,以逗号分隔,若不分表则不需要设置该项
#tables = person.mt.id.3
#默认字符集,设置该项后客户端不再需要执行SET NAMES语句
#charset = utf8
#允许连接Atlas的客户端的IP,可以是精确IP,也可以是IP段,以逗号分隔,若不设置该项则允许所有IP连接,否则只允许列表中的IP连接
#client-ips = 127.0.0.1, 192.168.1
#Atlas前面挂接的LVS的物理网卡的IP(注意不是虚IP),若有LVS且设置了client-ips则此项必须设置,否则可以不设置
#lvs-ips = 192.168.1.1
需要注意的是,mysql的密码需要经过/usr/local/mysql-proxy/bin/encrypt 程序进行加密: ./encrypt password
制作启动程序:
vim /etc/init.d/atlas
#!/bin/sh
#
#atlas: Atlas Daemon
#
# chkconfig: - 90 25
# description: Atlas Daemon
#
# Source function library.
start()
{
echo -n $"Starting atlas: "
/usr/local/mysql-proxy/bin/mysql-proxyd test start
echo
}
stop()
{
echo -n $"Shutting down atlas: "
/usr/local/mysql-proxy/bin/mysql-proxyd test stop
echo
}
ATLAS="/usr/local/mysql-proxy/bin/mysql-proxyd"
[ -f $ATLAS ] || exit 1
# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
sleep 3
start
;;
*)
echo $"Usage: $0 {start|stop|restart}"
exit 1
esac
exit 0
chmod a+x atlas
chkconfig atlas on; service atlas start
检查是否已经开始监听端口:
netstat -tunlp
看到3306端口已经在listen的状态,说明atlas已经开始工作了。
4 安装HAProxy
两台配置相同:
yum install haproxy -y
chkconfig haproxy on
修改haproxy的配置文件:vim /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Example configuration for a possible web application. See the
# full configuration options online.
#
# http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode tcp
log global
option dontlognull
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend main *:3306
mode tcp
default_backend nodes
#---------------------------------------------------------------------
# static backend for serving up images, stylesheets and such
#---------------------------------------------------------------------
#backend static
# balance roundrobin
# server static 127.0.0.1:4331 check
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend nodes
mode tcp
balance roundrobin
server app1 172.16.4.4:3306 check
server app2 172.16.4.5:3306 backup
最后的backup表明这台服务器是备份状态。
还可以配置运行状态监控,我在这里没有配置,哪位有兴趣可以自己加上。
5 Azure的ILB
Azure的ILB只能采用PowerShell配置。具体的命令是:
Add-AzureInternalLoadBalancer -InternalLoadBalancerName MyHAILB -SubnetName Subnet-2 -ServiceName atlasha01
get-AzureVM -ServiceName atlasha01 -Name atlasha01 | Add-AzureEndpoint -Name mysql -LBSetName mysqlha -Protocol tcp -LocalPort 3306 -PublicPort 3306 -ProbePort 3306 -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName MyHAILB | Update-AzureVM
get-AzureVM -ServiceName atlasha01 -Name atlasha02 | Add-AzureEndpoint -Name mysql -LBSetName mysqlha -Protocol tcp -LocalPort 3306 -PublicPort 3306 -ProbePort 3306 -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName MyHAILB | Update-AzureVM
PS C:> Get-AzureInternalLoadBalancer -servicename atlasha01
InternalLoadBalancerName : MyHAILB
ServiceName : atlasha01
DeploymentName : atlasha01
SubnetName : Subnet-2
IPAddress : 172.16.2.6
OperationDescription : Get-AzureInternalLoadBalancer
OperationId : cd86e37a-776c-4fc8-8e31-7bad6ebc88b7
OperationStatus : Succeeded
已经把两台HAProxy的主机加入到ILB的负载均衡中。
此时ILB的虚拟浮动地址是172.16.2.6
6 安装前端的Web服务器
我安装的是phpBB3,具体的安装方法请参考我的另外一篇博客:http://www.cnblogs.com/hengwei/p/4754408.html
需要注意的是,在安装phpBB3的安装过程中,需要输入MySQL提供服务的IP地址,此时需要填写ILB的地址:172.16.2.6
在一条主机上安装完成后,把网站内容复制到另外一台主机:
rsync -a /var/www/html/phpBB3 172.16.1.5:/var/www/html/phpBB3
当然要事先建好目录,另外要有root的权限和密码。
7 Azure的SLB
在配置Aure的主机时,Web的两台服务器要求配置到一个SLB中,这个在图形化界面里就可以操作了,我就不多描述了。配置好SLB后,需要配置SLB的工作模式,把它调整成source IP的hash模式。这样,可以保证同一客户端的请求总是访问同一台Web服务器。
这个工作需要通过PowerShell实现:
Set-AzureLoadBalanceEndpoint -LBSetName httpset -LoadBalancerDistribution sourceIP -ServiceName atlasweb
8 添加iptables, 和探测脚本,实现故障自动切换
主用的MySQL服务器出现故障,MySQL的服务会迁移到备用的服务器上。但当主用的服务器恢复后,HAProxy会把SQL的请求重新发回给主用的服务器。由于主用服务器下线过程中,备用MySQL的数据库会有数据更新,主用MySQL要从备用MySQL上同步数据,要同步后才能对外提供服务。
所以要在主用服务器启动脚本rc.local中添加iptables,只允许备用服务器访问主用MySQL服务器,并在加载了iptables后再启动mysql服务:
iptables -A INPUT -j ACCEPT -s 10.1.0.9/32 iptables -A INPUT -j DROP sleep 20 service mysqld start
主用服务器起来后,将与备用服务器同步MySQL的内容。此时,主用服务器需要监控备用服务器的状态,一旦备用服务器出现down机的情况,需要接管MySQL的服务。脚本如下:
#!/bin/sh #list iptables, result to file fw /sbin/iptables -L > fw #from file fw, grep key word "10.1.0" grep 10.1.0 > fwRus #if the fw include 10.1.0, means the fw has contents, the my2 is working, otherwise my1 is working if [ `wc -l fwRus |awk '{print $1}'` = 0 ] then echo $(TZ=Asia/Shanghai date "+%Y-%m-%d-%H-%M-%S") >> a echo "do nothing" >> a else { #fw has content, then to detect the my2's status, up or down ping -c 1 my2 > res grep ttl res > pingRus #if the my2 is up, do nothing, is down, remove fw, let my1 be active if [ `wc -l pingRus |awk '{print $1}'` = 0 ] then echo "del firewall" >> b echo $(TZ=Asia/Shanghai date "+%H-%M-%S-%Y-%m-%d") >> b /sbin/iptables -F else echo "host $ip is online" >> c echo $(TZ=Asia/Shanghai date "+%Y-%m-%d-%H-%M-%S") >> c fi } fi
在主用MySQL和备用MySQL同步后,需要运维人员手工关闭这条iptables的设置。使前端的服务可以通过ILB+HAProxy访问到主用MySQL。
根据和客户的交流,采用PING的形式检测第二台MySQL服务器的方式不是特别合理。故又采用检查MySQL间数据同步的状态,作为检测机制。其脚步如下:
#!/bin/sh function mysqlCheck() { time1=`date +"%Y%m%d%H%M%S"` time2=`date +"%Y-%m-%d %H:%M:%S"` CheckFile="/tmp/MySQL.${time1}" Flag=0 echo "----------Begin at: " $time2 "------------" > $CheckFile 2>&1 mysql -uroot -pcisco -e "show slave statusG" >> $CheckFile 2>&1 echo "" >> $CheckFile BM=`grep Seconds_Behind_Master $CheckFile | awk '{print $2}'` SIOR=`grep Slave_IO_Running $CheckFile | awk '{print $2}'` SSQLR=`grep Slave_SQL_Running $CheckFile | awk '{print $2}'` [ "$BM" == '0' -a "$SIOR" == 'Yes' -a "$SSQLR" == "Yes" ] && Flag=1 || FLag=0 return $Flag } #list iptables, result to file fw /sbin/iptables -L > fw #from file fw, grep key word "10.1.0" grep DROP fw > fwRus #if the fw include 10.1.0, means the fw has contents, the my2 is working, otherwise my1 is working if [ `wc -l fwRus |awk '{print $1}'` = 0 ] then echo $(TZ=Asia/Shanghai date "+%Y-%m-%d-%H-%M-%S") >> a echo "do nothing" >> a else { mysqlCheck if [ $Flag = 1 ] then echo "add my2 firewall, del my1 firewall" >> $CheckFile 2>&1 echo $(TZ=Asia/Shanghai date "+%H-%M-%S-%Y-%m-%d") >> $CheckFile 2>&1 ssh my2 "iptables -A INPUT -j ACCEPT -s 10.1.0.8/32" ssh my2 "iptables -A INPUT -j ACCEPT -s 10.1.0.4/32" ssh my2 "iptables -A INPUT -j ACCEPT -s 10.1.0.5/32" ssh my2 "iptables -A INPUT -j DROP" /sbin/iptables -F exit else echo "MySQL is not sync" >> $CheckFile 2>&1 echo $(TZ=Asia/Shanghai date "+%Y-%m-%d-%H-%M-%S") >> $CheckFile 2>&1 fi } fi
同样,此脚步需要加载到crontab里。
在第二台MySQL的服务器上,同样要在各种情况下判断,是否需要接管MySQL服务。其脚步为:
#!/bin/bash function mysqlCheck() { time1=`date +"%Y%m%d%H%M%S"` time2=`date +"%Y-%m-%d %H:%M:%S"` CheckFile="/tmp/MySQL.CheckFile" Flag=0 echo "----------Begin at: " $time2 "------------" >> $CheckFile 2>&1 echo "" >> $CheckFile BM=`mysql -uroot -pcisco -hmy1 -e "show slave statusG"|grep Seconds_Behind_Master|awk '{print $2}'` SIOR=`mysql -uroot -pcisco -hmy1 -e "show slave statusG"|grep Slave_IO_Running|awk '{print $2}'` SSQLR=`mysql -uroot -pcisco -hmy1 -e "show slave statusG"|grep Slave_SQL_Running |awk '{print $2}'` [ "$BM" == '0' -a "$SIOR" == 'Yes' -a "$SSQLR" == "Yes" ] && Flag=1 || FLag=0 return $Flag } function my2sqlCheck() { time1=`date +"%Y%m%d%H%M%S"` time2=`date +"%Y-%m-%d %H:%M:%S"` CheckFile="/tmp/MySQL.CheckFile" Flag=0 echo "----------Begin at: " $time2 "------------" >> $CheckFile 2>&1 echo "" >> $CheckFile BM=`mysql -uroot -pcisco -e "show slave statusG"|grep Seconds_Behind_Master|awk '{print $2}'` SIOR=`mysql -uroot -pcisco -e "show slave statusG"|grep Slave_IO_Running|awk '{print $2}'` SSQLR=`mysql -uroot -pcisco -e "show slave statusG"|grep Slave_SQL_Running |awk '{print $2}'` [ "$BM" == '0' -a "$SIOR" == 'Yes' -a "$SSQLR" == "Yes" ] && my2Flag=1 || my2FLag=0 return $my2Flag } while true do my1fw=`ssh my1 "iptables -L" | grep DROP| wc -l` my2fw=`iptables -L | grep DROP| wc -l` mysqlCheck if [ "$my1fw" = '0' -a "$Flag" = '1' ];then if [ "$my2fw" -ge '1' ];then echo "my1 mysql service is ok, my2 fw is on, do nothing" >> $CheckFile 2>&1 else iptables -A INPUT -j ACCEPT -s 10.1.0.8/32 iptables -A INPUT -j ACCEPT -s 10.1.0.4/32 iptables -A INPUT -j ACCEPT -s 10.1.0.5/32 iptables -A INPUT -j DROP echo "my1 mysql service is ok, my2 fw is off, add firewall" >> $CheckFile 2>&1 fi else if [ "$my2fw" = '0' ];then echo "my1 mysql service is not ok, my2 fw is off, my2 provide mysql service, do nothing" >> $CheckFile 2>&1 else LastFlag=`cat /tmp/LastStatus` if [ "$LastFlag" = '1' ];then iptables -F echo "my1 mysql service is not ok, my2 fw is on, delete firewall" >> $CheckFile 2>&1 else echo "my1 mysql service is off, but my2 database not sync, my2 can not provide mysql service now, do nothing" >> $CheckFile 2>&1 fi fi fi my2sqlCheck echo $my2Flag > /tmp/LastStatus sleep 60 done
此脚本做成守护进程,每一分钟运行一次。在rc.local中加载。
以上脚步仅供参考,在实际生产环境中,应该采用挂维护页面,中断数据库操作的情况下进行切换。
总结:
至此,所有的配置工作全部完成。