zoukankan      html  css  js  c++  java
  • 用pt-online-schema-change给大表在线加字段的时候导致从库数据丢失的问题

    今天同事在主库在给一个大表的字段新加了一个索引,因为是大表,所以用了pt-osc工具,在添加完索引没过多久,开发那边反应丢数据了。

      这个表以前是写在主库的,后来不知道是什么原因改成了写从库,也就是说主库有部分以前的历史数据,但是新的数据都是写在从库上,同事用pt-osc加索引的时候是在主库加的,于是把主库的历史数据同步过来了,从库上的新数据丢了。

      通过以下例子也以看出为什么会发生这种情况:

    >select count(*) from goods;

    +----------+

    | count(*) |

    +----------+

    |  1426200 |

    +----------+

    CREATE TABLE `goods` (

      `rec_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,

      `user_id` mediumint(8) unsigned NOT NULL DEFAULT '0',

      `goods_id` mediumint(8) unsigned NOT NULL DEFAULT '0',

      `add_time` int(11) unsigned NOT NULL DEFAULT '0',

      `is_attention` tinyint(1) NOT NULL DEFAULT '0',

      `wid` tinyint(3) unsigned NOT NULL DEFAULT '1' COMMENT '仓id,默认为1表示中国仓',

      PRIMARY KEY (`rec_id`),

      KEY `user_id` (`user_id`),

      KEY `goods_id` (`goods_id`),

      KEY `is_attention` (`is_attention`)

    ) ENGINE=InnoDB AUTO_INCREMENT=1721748 DEFAULT CHARSET=utf8  

    我们先把goods表上的user_id索引删除:

    alter table goods drop key user_id;

    在主库上删除部分数据:

    >set sql_log_bin=off;

    >delete from goods where rec_id>721747;

    Query OK, 848254 rows affected (20.24 sec)

    >set sql_log_bin=on;

    >select count(*) from test;

    +----------+

    | count(*) |

    +----------+

    |   577946 |

    +----------+

    测试一:

    先直接用alter table语句对goods表的user_id字段加索引:

    >alter table  test add key (user_id);

    主库:

    show create table testG

    *************************** 1. row ***************************

           Table: test

    Create Table: CREATE TABLE `test` (

      `rec_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,

      `user_id` mediumint(8) unsigned NOT NULL DEFAULT '0',

      `goods_id` mediumint(8) unsigned NOT NULL DEFAULT '0',

      `add_time` int(11) unsigned NOT NULL DEFAULT '0',

      `is_attention` tinyint(1) NOT NULL DEFAULT '0',

      `wid` tinyint(3) unsigned NOT NULL DEFAULT '1' COMMENT '仓id,默认为1表示中国仓',

      PRIMARY KEY (`rec_id`),

      KEY `goods_id` (`goods_id`),

      KEY `is_attention` (`is_attention`),

      KEY `user_id` (`user_id`)

    ) ENGINE=InnoDB AUTO_INCREMENT=1721748 DEFAULT CHARSET=utf8

    1 row in set (0.00 sec)

    >select count(*) from test;

    +----------+

    | count(*) |

    +----------+

    |   577946 |

    +----------+

    1 row in set (0.22 sec)

    从库:

    show create table testG

    *************************** 1. row ***************************

           Table: test

    Create Table: CREATE TABLE `test` (

      `rec_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,

      `user_id` mediumint(8) unsigned NOT NULL DEFAULT '0',

      `goods_id` mediumint(8) unsigned NOT NULL DEFAULT '0',

      `add_time` int(11) unsigned NOT NULL DEFAULT '0',

      `is_attention` tinyint(1) NOT NULL DEFAULT '0',

      `wid` tinyint(3) unsigned NOT NULL DEFAULT '1' COMMENT '仓id,默认为1表示中国仓',

      PRIMARY KEY (`rec_id`),

      KEY `goods_id` (`goods_id`),

      KEY `is_attention` (`is_attention`),

      KEY `user_id` (`user_id`)

    ) ENGINE=InnoDB AUTO_INCREMENT=1721748 DEFAULT CHARSET=utf8

    1 row in set (0.00 sec)

    >select count(*) from test;

    +----------+

    | count(*) |

    +----------+

    |  1426200 |

    +----------+

    1 row in set (0.47 sec)

    可以看到直接用alter table加索引的时候从库的数据没有变化。

    测试二:

    用pt-osc工作在主库goods表的user_id列加索引

    先删除user_id列的索引,在主库上执行:

    alter table  test drop index user_id;

    pt-osc加索引:

    pt-online-schema-change --nocheck-replication-filters --recursion-method=none --charset=utf8 --alter "add index (user_id) " h=localhost,P=3306,u=root,p=123456,D=test,t=test --print --execute

    Successfully altered `test`.`test`. 提示索引加成功。

    主库:

    >select count(*) from test;

    +----------+

    | count(*) |

    +----------+

    |   577946 |

    +----------+

    从库:

    >select count(*) from test;

    +----------+

    | count(*) |

    +----------+

    |   577946 |

    +----------+

    可以看到,从库的数据被主库覆盖。

    为什么会出现这个原因呢,我们可以具体看一下加索引的过程:

    # pt-online-schema-change --nocheck-replication-filters --recursion-method=none --charset=utf8 --alter "add index (user_id) " h=localhost,P=3306,u=root,p=123456,D=test,t=test --print --execute

    No slaves found.  See --recursion-method if host master has slaves.

    Not checking slave lag because no slaves were found and --check-slave-lag was not specified.

    Operation, tries, wait:

      copy_rows, 10, 0.25

      create_triggers, 10, 1

      drop_triggers, 10, 1

      swap_tables, 10, 1

      update_foreign_keys, 10, 1

    Altering `test`.`test`...

    Creating new table...

    CREATE TABLE `test`.`_test_new` (

      `rec_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,

      `user_id` mediumint(8) unsigned NOT NULL DEFAULT '0',

      `goods_id` mediumint(8) unsigned NOT NULL DEFAULT '0',

      `add_time` int(11) unsigned NOT NULL DEFAULT '0',

      `is_attention` tinyint(1) NOT NULL DEFAULT '0',

      `wid` tinyint(3) unsigned NOT NULL DEFAULT '1' COMMENT '仓id,默认为1表示中国仓',

      PRIMARY KEY (`rec_id`),

      KEY `goods_id` (`goods_id`),

      KEY `is_attention` (`is_attention`)

    ) ENGINE=InnoDB AUTO_INCREMENT=1721748 DEFAULT CHARSET=utf8

    Created new table test._test_new OK.

    Altering new table...

    ALTER TABLE `test`.`_test_new` add index (user_id) 

    Altered `test`.`_test_new` OK.

    2016-02-01T18:11:48 Creating triggers...

    CREATE TRIGGER `pt_osc_test_test_del` AFTER DELETE ON `test`.`test` FOR EACH ROW DELETE IGNORE FROM `test`.`_test_new` WHERE `test`.`_test_new`.`rec_id` <=> OLD.`rec_id`

    CREATE TRIGGER `pt_osc_test_test_upd` AFTER UPDATE ON `test`.`test` FOR EACH ROW REPLACE INTO `test`.`_test_new` (`rec_id`, `user_id`, `goods_id`, `add_time`, `is_attention`, `wid`) VALUES (NEW.`rec_id`, NEW.`user_id`, NEW.`goods_id`, NEW.`add_time`, NEW.`is_attention`, NEW.`wid`)

    CREATE TRIGGER `pt_osc_test_test_ins` AFTER INSERT ON `test`.`test` FOR EACH ROW REPLACE INTO `test`.`_test_new` (`rec_id`, `user_id`, `goods_id`, `add_time`, `is_attention`, `wid`) VALUES (NEW.`rec_id`, NEW.`user_id`, NEW.`goods_id`, NEW.`add_time`, NEW.`is_attention`, NEW.`wid`)

    2016-02-01T18:11:48 Created triggers OK.

    2016-02-01T18:11:48 Copying approximately 578405 rows...

    INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_new` (`rec_id`, `user_id`, `goods_id`, `add_time`, `is_attention`, `wid`) SELECT `rec_id`, `user_id`, `goods_id`, `add_time`, `is_attention`, `wid` FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`rec_id` >= ?)) AND ((`rec_id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 30910 copy nibble*/

    SELECT /*!40001 SQL_NO_CACHE */ `rec_id` FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`rec_id` >= ?)) ORDER BY `rec_id` LIMIT ?, 2 /*next chunk boundary*/

    2016-02-01T18:12:09 Copied rows OK.

    2016-02-01T18:12:09 Swapping tables...

    RENAME TABLE `test`.`test` TO `test`.`_test_old`, `test`.`_test_new` TO `test`.`test`

    2016-02-01T18:12:09 Swapped original and new tables OK.

    2016-02-01T18:12:09 Dropping old table...

    DROP TABLE IF EXISTS `test`.`_test_old`

    2016-02-01T18:12:09 Dropped old table `test`.`_test_old` OK.

    2016-02-01T18:12:09 Dropping triggers...

    DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_del`;

    DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_upd`;

    DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_ins`;

    2016-02-01T18:12:09 Dropped triggers OK.

    Successfully altered `test`.`test`.

    在建索引的过程中,pt-osc会新将原表的数据拷贝到一张临时表里面,创建三个存储过程来同步有变更的数据,先在临时表上面加索引,加完索引后再将临时表rename。

    其实如果可以尽量只写主库不写从库,这样可以避免很多坑。

  • 相关阅读:
    Oracle创建序列,删除序列
    java base58
    百度地图 显示,定位,轮廓图
    百度地图 圈出省份轮廓图并高亮
    基于双向链表的增删改查和排序(C++实现)
    统计字母出现次数
    线程安全
    C++面试秘笈笔记
    牛客选择题刷题
    new delete 浅析
  • 原文地址:https://www.cnblogs.com/liuzhuqing/p/7307013.html
Copyright © 2011-2022 走看看