pt-ioprofile工具:
1.percona-toolkit(在http://www.percona.com下载)
2.安装该工具依赖的perl组件
yum install perl-IO-Socket-SSL perl-DBD-MySQL perl-Time-HiRes -y
1.percona-toolkit(在http://www.percona.com下载)
2.安装该工具依赖的perl组件
yum install perl-IO-Socket-SSL perl-DBD-MySQL perl-Time-HiRes -y
3.rpm -ivh percona-toolkit-2.2.8-1.noarch.rpm
pt-table-checksum
主库
root@localhost 22:57:02 [(none)]> GRANT SELECT, insert,update,delete,create,PROCESS, SUPER, REPLICATION SLAVE ON *.* TO 'lk'@'%' identified by 'lk';
(可以校验有主键和没有主键的表,但修复只能修复有主键的)
主库:
root@localhost 13:28:47[db1]> select * from test2;
+----+------+
| id | name |
+----+------+
| 1| a |
| 2| a |
| 3| a |
+----+------+
3 rows in set(0.00 sec)
从库:
mysql> select * from test2;
+----+------+
| id | name |
+----+------+
| 1| a |
| 2| bbbb |
| 4| a |
+----+------+
3 rows in set(0.00 sec)
h是主库ip,开始校验:
[root@likun ~]# pt-table-checksum h='192.168.1.20',u='lk',p='lk',P=3306-d db1 --nocheck-replication-filters --replicate=db1.checksums --create-replicate-table --no-check-binlog-format
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
09-09T23:50:00 0 1 1 1 0 0.521 db1.test
09-09T23:50:01 0 1 2 1 0 0.278 db1.test2
主要看:DIFFS(0 表示全部一致)
其次是 ERRORS
其次是 ERRORS
再其次是CHUNKS
执行结果显示参数意义:
TS :完成检查的时间。
ERRORS :检查时候发生错误和警告的数量。
DIFFS :0表示一致,大于0表示不致。当指定--no-replicate-check时,会一直为0,当指定--replicate-check-only会显示不同的信息。
ROWS :表的行数。
CHUNKS :被划分到表中的块的数目。
SKIPPED :由于错误或警告或过大,则跳过块的数目。
TIME :执行的时间。
TABLE :被检查的表名。
pt-table-sync
发生不一致,想查看一下那些不致? 怎么查?
原理是将差异数据,包装成replace into和delete,在主库执行,进而同步到从库。这些改变不会更改主库。
--print把要修复的用的语句打出来
pt-table-sync --replicate=db1.checksums --databases=db1 --charset=utf8 h='192.168.1.20',u='lk',p='lk' --print
[root@likun data]# pt-table-sync --replicate=db1.checksums --databases=db1 --charset=utf8 h='192.168.1.20',u='lk',p='lk'--print
Can't make changes on the master because no unique index exists at /usr/bin/pt-table-sync line 10648. while doing db1.test on 192.168.1.25
DELETE FROM `db1`.`test2` WHERE `id`='4' LIMIT 1/*percona-toolkit src_db:db1 src_tbl:test2 src_dsn:A=utf8,h=192.168.1.20,p=...,u=lk dst_db:db1 dst_tbl:test2 dst_dsn:A=utf8,h=192.168.1.25,p=...,u=lk lock:1 transaction:1 changing_src:db1.checksums replicate:db1.checksums bidirectional:0 pid:3669 user:root host:likun.com*/;
REPLACE INTO `db1`.`test2`(`id`,`name`) VALUES ('2','a')/*percona-toolkit src_db:db1 src_tbl:test2 src_dsn:A=utf8,h=192.168.1.20,p=...,u=lk dst_db:db1 dst_tbl:test2 dst_dsn:A=utf8,h=192.168.1.25,p=...,u=lk lock:1 transaction:1 changing_src:db1.checksums replicate:db1.checksums bidirectional:0 pid:3669 user:root host:likun.com*/;
REPLACE INTO `db1`.`test2`(`id`,`name`) VALUES ('3','a')/*percona-toolkit src_db:db1 src_tbl:test2 src_dsn:A=utf8,h=192.168.1.20,p=...,u=lk dst_db:db1 dst_tbl:test2 dst_dsn:A=utf8,h=192.168.1.25,p=...,u=lk lock:1 transaction:1 changing_src:db1.checksums replicate:db1.checksums bidirectional:0 pid:3669 user:root host:likun.com*/;
--execute在主库执行修复语句:
pt-table-sync --replicate=db1.checksums --databases=db1 --charset=utf8 h='192.168.1.20',u='lk',p='lk' --execute
之后,看主库binlog日志:
DELETE FROM `db1`.`test2` WHERE `id`='4' LIMIT 1/*percona-toolkit src_db:db1 src_tbl:test2 src_dsn:A=utf8,h=192.168.1.20,p=...,u=lk dst_db:db1 dst_tbl:test2 dst_dsn:A=utf8,h=192.168.1.25,p=...,u=lk lock:1 transaction:1 changing_src:db1.checksums replicate:db1.checksums bidirectional:0 pid:3697 user:root host:likun.com*/
/*!*/;
# at 558
#140917 13:35:11 server id 1 end_log_pos 952 Query thread_id=1749 exec_time=0 error_code=0
SET TIMESTAMP=1410932111/*!*/;
REPLACE INTO `db1`.`test2`(`id`,`name`) VALUES ('2','a')/*percona-toolkit src_db:db1 src_tbl:test2 src_dsn:A=utf8,h=192.168.1.20,p=...,u=lk dst_db:db1 dst_tbl:test2 dst_dsn:A=utf8,h=192.168.1.25,p=...,u=lk lock:1 transaction:1 changing_src:db1.checksums replicate:db1.checksums bidirectional:0 pid:3697 user:root host:likun.com*/
/*!*/;
# at 952
#140917 13:35:11 server id 1 end_log_pos 1346 Query thread_id=1749 exec_time=0 error_code=0
SET TIMESTAMP=1410932111/*!*/;
REPLACE INTO `db1`.`test2`(`id`,`name`) VALUES ('3','a')/*percona-toolkit src_db:db1 src_tbl:test2 src_dsn:A=utf8,h=192.168.1.20,p=...,u=lk dst_db:db1 dst_tbl:test2 dst_dsn:A=utf8,h=192.168.1.25,p=...,u=lk lock:1 transaction:1 changing_src:db1.checksums replicate:db1.checksums bidirectional:0 pid:3697 user:root host:likun.com*/
/*!*/;
# at 1346
#140917 13:35:11 server id 1 end_log_pos 1373 Xid = 5490152
COMMIT/*!*/;
查看从库relay-log:
DELETE FROM `db1`.`test2` WHERE `id`='4' LIMIT 1/*percona-toolkit src_db:db1 src_tbl:test2 src_dsn:A=utf8,h=192.168.1.20,p=...,u=lk dst_db:db1 dst_tbl:test2 dst_dsn:A=utf8,h=192.168.1.25,p=...,u=lk lock:1 transaction:1 changing_src:db1.checksums replicate:db1.checksums bidirectional:0 pid:3697 user:root host:likun.com*/
/*!*/;
# at 704
#140917 13:35:11 server id 1 end_log_pos 952 Query thread_id=1749 exec_time=0 error_code=0
SET TIMESTAMP=1410932111/*!*/;
REPLACE INTO `db1`.`test2`(`id`,`name`) VALUES ('2','a')/*percona-toolkit src_db:db1 src_tbl:test2 src_dsn:A=utf8,h=192.168.1.20,p=...,u=lk dst_db:db1 dst_tbl:test2 dst_dsn:A=utf8,h=192.168.1.25,p=...,u=lk lock:1 transaction:1 changing_src:db1.checksums replicate:db1.checksums bidirectional:0 pid:3697 user:root host:likun.com*/
/*!*/;
# at 1098
#140917 13:35:11 server id 1 end_log_pos 1346 Query thread_id=1749 exec_time=0 error_code=0
SET TIMESTAMP=1410932111/*!*/;
REPLACE INTO `db1`.`test2`(`id`,`name`) VALUES ('3','a')/*percona-toolkit src_db:db1 src_tbl:test2 src_dsn:A=utf8,h=192.168.1.20,p=...,u=lk dst_db:db1 dst_tbl:test2 dst_dsn:A=utf8,h=192.168.1.25,p=...,u=lk lock:1 transaction:1 changing_src:db1.checksums replicate:db1.checksums bidirectional:0 pid:3697 user:root host:likun.com*/
/*!*/;
# at 1492
#140917 13:35:11 server id 1 end_log_pos 1373 Xid = 5490152
COMMIT/*!*/;
最后检查主从全变成:
+----+------+
| id | name |
+----+------+
| 1| a |
| 2| a |
| 3| a |
+----+------+
数据校验和同步,在业务压力较小的时候执行就可以。
小tips:主库如何找到从库?
--recursion-method="processlist" 发现从库的方法(processlist要求主从的端口一样)
其他方式:
METHOD USES
=========== =============================================
processlist SHOW PROCESSLIST
hosts SHOW SLAVE HOSTS
cluster SHOW STATUS LIKE 'wsrep\_incoming\_addresses'
dsn=DSN DSNs from a table
none Do not find slaves
pt-ioprofile
当IO已经达到100%,如何排查磁盘正在干什么呢?用它来做
可以查看一个进程(默认 mysqld)正在写那些文件。
可以用 –p 指定 要跟的进程
[root@likun data]# pt-ioprofile --profile-pid=2679--cell=sizes --run-time=10
WedSep1714:30:29 CST 2014
Tracing process ID 2679
total read pwrite write fsync filename
65536 0 65536 0 0/mysql/data/ibdata1
1536 0 1536 0 0/mysql/data/ib_logfile1
512 0 512 0 0/mysql/data/ib_logfile0
374 187 0 187 0/mysql/data/mysql-bin.000045
95 0 0 95 0/mysql/data/likun.log
pt-query-digest
统计慢查询,简单用法:pt-query-digest slow.log > query.log
讲查询结果写入表中:
[root@likun data]# pt-query-digest --user=lk --password=lk --review h=192.168.1.20,D=db1,t=query --history h=192.168.1.20,D=db1,t=query_history --no-report --limit=0%--filter=" $event->{Bytes} = length($event->{arg}) and $event->{hostname}="$HOSTNAME""/mysql/data/likun-slow.log
root@localhost 16:16:56[db1]>select*from query;
+----------------------+----------------------------------------+---------------------------------------------+-------------------
| checksum | fingerprint | sample | first_seen
+----------------------+----------------------------------------+---------------------------------------------+-------------------
| 3835665160750404163|set@master_heartbeat_period=? | SET @master_heartbeat_period=1799999979520|2014-09-1716:06:3
| 4080254502890102626|select user() |select USER() |2014-09-1716:05:5
| 4757751776728164035|setglobal long_query_time=? |setglobal long_query_time=1 |2014-09-1716:06:4
| 6323544616612860337|select@@long_query_time |select@@long_query_time |2014-09-1716:05:5
| 8071533740163008823|select count(*)from d?test where id=?|select count(*)from db1.test where id=1 |2014-09-1716:06:2
| 8281087749045658124|select unix_timestamp() | SELECT UNIX_TIMESTAMP() |2014-09-1716:06:3
| 9067424411701554875| show variables like ? | SHOW VARIABLES LIKE 'SERVER_ID' |2014-09-1716:06:3
|12264768829628632244| administrator command:Quit | administrator command:Quit |2014-09-1716:06:4
|14719931978391102173|set names utf? | SET NAMES utf8 |2014-09-1716:06:3
|16403064890596278669|select@@version_comment limit ? |select@@version_comment limit 1 |2014-09-1716:05:5
|17087051866937599150| administrator command:RegisterSlave | administrator command:RegisterSlave |2014-09-1716:06:3
+----------------------+----------------------------------------+---------------------------------------------+-------------------
11 rows inset(0.00 sec)
pt-duplicate-key-checker
检查重复索引
[root@likun ~]# pt-duplicate-key-checker --host=192.168.1.20--port=3306--user=lk --password=lk
# ########################################################################
# world.CountryLanguage
# ########################################################################
# CountryCode is a left-prefix of PRIMARY
# Key definitions:
# KEY `CountryCode` (`CountryCode`),
# PRIMARY KEY (`CountryCode`,`Language`),
# Column types:
# `countrycode` char(3) not null default ''
# `language` char(30) not null default ''
# To remove this duplicate index, execute:
ALTER TABLE `world`.`CountryLanguage` DROP INDEX `CountryCode`;
# ########################################################################
# Summary of indexes
# ########################################################################
# Size Duplicate Indexes 4038
# Total Duplicate Indexes 1
# Total Indexes 48
pt-kill
杀掉执行超时的sql
每隔5秒检查一下运行超过4s的sql,--print是查看,--kill是杀掉
[root@likun ~]# pt-kill --busy-time 4 --print --host=192.168.1.20 --user=lk --password=lk P=3306 --interval 5
# 2014-09-17T16:46:07 KILL 35 (Query 48 sec) select count(*) from yw a,yw b
# 2014-09-17T16:46:12 KILL 35 (Query 53 sec) select count(*) from yw a,yw b
[root@likun ~]# pt-kill --busy-time 4 --kill --host=192.168.1.20 --user=lk --password=lk P=3306 --interval 5
使用DSN方式,将杀掉的sql写入表kill_log中:
[root@likun ~]# pt-kill --log-dsn=h=192.168.1.20,P=3306,D=db1,t=kill_log,u=lk,p=lk --busy-time 4 --kill
^C
[root@likun ~]# pt-kill --log-dsn=h=192.168.1.20,P=3306,D=db1,t=kill_log,u=lk,p=lk --busy-time 4 --print
pt-summary
pt-mysql-summary
查看系统状态、查看mysql状态 一般用于新接手系统查看
pt-online-schema-change
MySQL 5.5的版本在DDL方面是要付出代价的,虽然已经有了Fast index Creation,但是在添加字段还是会锁表的,而且在添加删除辅助索引是会加S锁,也就是无法进行写操作。所以,这里就有相关工具的出现,那就是pt-online-schema-change
大概工作原理:
(1)如果存在外键,根据alter-foreign-keys-method参数的值,检测外键相关的表,针对相应的设置进行处理。
(2)创建一个新的表,表结构为修改后的数据表,用于从源数据表向新表中导入数据。
(3)创建触发器,在复制数据开始之后,将对源数据表继续进行数据修改的操作记录下来,以便在数据复制结束后执行这些操作,保证数据不会丢失。
(4)复制数据,从源数据表中复制数据到新表中。
(5)修改外键相关的子表,根据修改后的数据,修改外键关联的子表。
(6)更改源数据表为old表,把新表改为源表名,并将old表删除。
(7)删除触发器。
存在如下限制:
(1)对操作的表必须要有主键或者唯一键
(2)增加的字段如果为NOT NULL,会报错,需要添加默认值才可以成功。
用户需要trigger、create 、alter 等权限,系统最好autocommit=ON
[root@likun ~]# pt-online-schema-change --alter="add column addr char(20)"--user=lk --password=lk --host=192.168.1.20 D=db1,t=test2 --execute
Found1 slaves:
mysql1.com
Will check slave lag on:
mysql1.com
Operation, tries, wait:
copy_rows,10,0.25
create_triggers,10,1
drop_triggers,10,1
swap_tables,10,1
update_foreign_keys,10,1
Altering`db1`.`test2`...
Creatingnew table...
Creatednew table db1._test2_new OK.
Alteringnew table...
Altered`db1`.`_test2_new` OK.
2014-09-18T08:09:15Creating triggers...
2014-09-18T08:09:15Created triggers OK.
2014-09-18T08:09:15Copying approximately 4 rows...
2014-09-18T08:09:15Copied rows OK.
2014-09-18T08:09:15Swapping tables...
2014-09-18T08:09:15Swapped original and new tables OK.
2014-09-18T08:09:15Dropping old table...
2014-09-18T08:09:15Dropped old table `db1`.`_test2_old` OK.
2014-09-18T08:09:15Dropping triggers...
2014-09-18T08:09:15Dropped triggers OK.
Successfully altered `db1`.`test2`.