zoukankan      html  css  js  c++  java
  • 技术分享 | mysql 表数据校验

    1. checksum table.

    checksum table 会对表一行一行进行计算,直到计算出最终的 checksum 结果。比如对表 n4 进行校验(记录数 157W,大小为 4G)

    [ytt]>desc n4;
    +-------+--------------+------+-----+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +-------+--------------+------+-----+---------+-------+
    | id | int(11) | YES | | NULL | |
    | r1 | char(36) | YES | | NULL | |
    | r2 | varchar(100) | YES | | NULL | |
    | r3 | datetime | YES | | NULL | |
    | r4 | text | YES | | NULL | |
    +-------+--------------+------+-----+---------+-------+
    5 rows in set (0.00 sec)
    [ytt]>select count(*) from n4;
    +----------+
    | count(*) |
    +----------+
    | 1572864 |
    +----------+
    1 row in set (6.89 sec)
    [ytt]>checksum table n4;
    +--------+-----------+
    | Table | Checksum |
    +--------+-----------+
    | ytt.n4 | 874125175 |
    +--------+-----------+
    1 row in set (8.24 sec)

    我自己笔记本上的测试结果,速度挺快。

    不过 checksum 的限制比较多。罗列如下,

    A、不能对视图进行校验。

    [ytt]>checksum table v_n3;
    +----------+----------+
    | Table | Checksum |
    +----------+----------+
    | ytt.v_n3 | NULL |
    +----------+----------+
    1 row in set, 1 warning (0.00 sec)
    [ytt]>show warnings;
    +-------+------+------------------------------+
    | Level | Code | Message |
    +-------+------+------------------------------+
    | Error | 1347 | 'ytt.v_n3' is not BASE TABLE |
    +-------+------+------------------------------+
    1 row in set (0.00 sec)

    B、字段顺序不同,校验结果也会不一致。

    [ytt]>desc n3;
    +-------+---------+------+-----+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +-------+---------+------+-----+---------+-------+
    | id | int(11) | NO | | NULL | |
    | r1 | int(11) | YES | | NULL | |
    +-------+---------+------+-----+---------+-------+
    2 rows in set (0.00 sec)
    [ytt]>desc n5;
    +-------+---------+------+-----+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +-------+---------+------+-----+---------+-------+
    | r1 | int(11) | YES | | NULL | |
    | id | int(11) | NO | | NULL | |
    +-------+---------+------+-----+---------+-------+
    2 rows in set (0.00 sec)
    [ytt]>checksum table n3,n5;
    +--------+------------+
    | Table | Checksum |
    +--------+------------+
    | ytt.n3 | 1795175396 |
    | ytt.n5 | 838415794 |
    +--------+------------+
    2 rows in set (0.00 sec)

    C、CHAR(100) 和 VARCHAR(100) 存储相同的字符,校验结果也会不一致。

    [ytt]>desc n6;
    +-------+--------------+------+-----+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +-------+--------------+------+-----+---------+-------+
    | id | int(11) | NO | | NULL | |
    | r1 | int(11) | YES | | NULL | |
    | s1 | varchar(100) | YES | | NULL | |
    +-------+--------------+------+-----+---------+-------+
    3 rows in set (0.00 sec)
    [ytt]>desc n3;
    +-------+-----------+------+-----+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +-------+-----------+------+-----+---------+-------+
    | id | int(11) | NO | | NULL | |
    | r1 | int(11) | YES | | NULL | |
    | s1 | char(100) | YES | | NULL | |
    +-------+-----------+------+-----+---------+-------+
    3 rows in set (0.00 sec)
    [ytt]>select * from n6;
    Empty set (0.00 sec)
    [ytt]>insert into n6 select * from n3;
    Query OK, 8 rows affected (0.01 sec)
    Records: 8 Duplicates: 0 Warnings: 0
    [ytt]>checksum table n3,n6;
    +--------+------------+
    | Table | Checksum |
    +--------+------------+
    | ytt.n3 | 2202684200 |
    | ytt.n6 | 455222236 |
    +--------+------------+
    2 rows in set (0.00 sec)

    D、在执行 checksum 同时,会对表所有行加共享读锁。

    E、还有就是 MySQL 版本不同,有可能校验结果不一致。比如手册上说的,MySQL 5.6.5 之后的版本对时间类型的存储格式有变化,导致校验结果不一致。那 checksum 的 限制这么多,我们是不是有其方法来突破所有限制呢?比如说可以模拟 checksum table 的原理来手工计算。

    2. 自己计算 checksum 值。

    这里用了 MySQL 自身的几个特性:session 变量;通用表达式;窗口函数以及 MySQL 的 concat_ws 函数实现非常简单。比如我们用 sha 函数来计算校验值。

    [ytt]>set @crc='';
    Query OK, 0 rows affected (0.00 sec)
    [ytt]>
    [ytt]>with ytt (r,rn) as
    -> (
    -> select @crc:= sha(concat_ws('#',@crc,id,r1,r2,r3,r4)) as r, row_number() over() as rn
    -> from n4
    -> )
    -> select 'n4' tablename, r checksum from ytt where rn = 1572864 ;
    +-----------+------------------------------------------+
    | tablename | checksum |
    +-----------+------------------------------------------+
    | n4 | a9711af93399e0d195a53f4148adea46ab684d30 |
    +-----------+------------------------------------------+
    1 row in set, 1 warning (16.46 sec)

    如果在 MySQL 老版本运行,可以利用 MySQL 的黑洞引擎,改下 SQL 如下:

    [ytt]>create table tmp_checksum (checksum varchar(100)) engine blackhole;
    Query OK, 0 rows affected (0.08 sec)
    [ytt]>
    [ytt]>set @crc='';insert into tmp_checksum
    Query OK, 0 rows affected (0.00 sec)
    -> select @crc:= sha(concat_ws('#',@crc,id,r1,r2,r3,r4)) as r from n4;
    Query OK, 1572864 rows affected, 1 warning (20.11 sec)
    Records: 1572864 Duplicates: 0 Warnings: 1
    [ytt]>select 'n4' tablename,@crc checksum;
    +-----------+------------------------------------------+
    | tablename | checksum |
    +-----------+------------------------------------------+
    | n4 | a9711af93399e0d195a53f4148adea46ab684d30 |
    +-----------+------------------------------------------+
    1 row in set (0.00 sec)

    总结

    对于表要计算校验数据一致性的需求,首选第二种自己写 SQL 的方法。

  • 相关阅读:
    php无法保存cookies问题解决
    【PHP基础】最快速简易apache+mysql本地PHP环境搭建教程
    php数组指针探究
    php学习笔记[php中面向对象三大特性之一[继承性]的应用]
    Cpanel如何设置index”缺省首页”?.htaccess设置网站默认首页次序?
    php学习笔记[php中面向对象三大特性之一[封装性]的应用]
    php学习笔记[PHP面向对象的程序设计]
    windows下配置nginx+php环境
    PHP学习之路(三)让我们开始环境搭建(搭建LMAP基于Ubuntu11.04)
    php session_unset与session_destroy的区别
  • 原文地址:https://www.cnblogs.com/ct20150811/p/11558951.html
Copyright © 2011-2022 走看看