zoukankan      html  css  js  c++  java
  • MySQL大表基于主键ID删除逻辑

    测试环境准备

    创建测试表

    -- 表结构示例
    CREATE TABLE `g_device_action_base` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `uid` char(32)  DEFAULT '',
      `domain_id` char(16)  DEFAULT '',
      `machine` char(200)  NOT NULL DEFAULT '',
      `app_type` char(32)  NOT NULL DEFAULT '' ,
      `app_id` char(32)  NOT NULL DEFAULT '' ,
      `action_time` int(11) NOT NULL DEFAULT '0',
      `action_status` int(11) NOT NULL DEFAULT '0',
      `source` char(32)  NOT NULL DEFAULT '',
      `url` varchar(512)  NOT NULL DEFAULT '' COMMENT 'url',
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB;
    
    -- 记录示例
    mysql> select * from g_device_action_base limit 1G
               id: 24244024
              uid: 779085e3ac9
        domain_id: LhziEhqb8W
          machine: DBA
         app_type: wechat
           app_id: 3e261dcf5485fb0f1c00
      action_time: 1595222484
    action_status: 1
           source: jw_app_hard
              url: https://www.cnblogs.com/zhenxing/
    
    -- 造数据
    -- 插入一条基础数据
    set session sql_log_bin=off;
    insert into g_device_action_base(uid,domain_id,machine,app_type,app_id,action_time,action_status,source,url)
    	values('779085e3ac9a32e8927099c2be506228','LhziEhqb8WgS','IOS','jw_app_thirdapp','3e261dcf5485fb0f1c0052f838ae6779',1595222484,1,'zhenxing','https://www.cnblogs.com/zhenxing/');
    
    -- 反复执行,成倍增加
    insert into g_device_action_base(id,uid,domain_id,machine,app_type,app_id,action_time,action_status,source,url) select null,uid,domain_id,machine,app_type,app_id,action_time,action_status,source,url from g_device_action_base;
    
    -- 直到生成100W测试数据
    select count(*) from g_device_action_base;
    
    -- 基于数据基础表创建测试表
    create table g_device_action like g_device_action_base;
    

    灌测试数据

    假设g_device_action_base表注入了100万测试数据,现在要模拟5000万的数据删除操作,循环50次,每次重复插入100万数据到g_device_action表中,以下是基本的插入数据的脚本逻辑

    #!/bin/bash
    for ((i=1;i<=50;i++))
    do
    echo "load batch $i"
    mysql <<EOF
    set session sql_log_bin=off;
    use demo
    insert into g_device_action(uid,domain_id,machine,app_type,app_id,action_time,action_status,source,url)
            select uid,domain_id,machine,app_type,app_id,action_time,action_status,source,url from g_device_action_base;
    select sleep(2);
    EOF
    done
    

    创建日志表

    日志表用来分批次删除数据的状态和执行时间等情况,便于追溯删除操作

    CREATE TABLE `delete_batch_log` (
      `ID` bigint(20) PRIMARY key AUTO_INCREMENT,
      `BATCH_ID` bigint(20) NOT NULL comment "批次号",
      `SCHEMA_NAME` varchar(64) NOT NULL comment "数据库名称",
      `TABLE_NAME` varchar(64) NOT NULL comment "表名称",
      `BATCH_COUNT` bigint(20) NOT NULL comment "涉及的记录数",
      `BEGIN_RECORD` varchar(100) DEFAULT NULL comment "ID最小值",
      `END_RECORD` varchar(100)DEFAULT NULL comment "ID最大值",
      `BEGIN_TIME` datetime(6) DEFAULT NULL comment "开始时间",
      `END_TIME` datetime(6) DEFAULT NULL comment "结束时间",
      `ERROR_NO` bigint(20) DEFAULT NULL comment "错误码",
      `crc32_values` varchar(64) DEFAULT NULL comment "校验码"
    );
    
    -- 创建相关查询需要的索引
    CREATE INDEX IDX_DELETE_BATCH_LOG_M1 ON delete_batch_log(BEGIN_RECORD,END_RECORD);
    CREATE INDEX IDX_DELETE_BATCH_LOG_M2 ON delete_batch_log(BEGIN_TIME,END_TIME);
    CREATE INDEX IDX_DELETE_BATCH_LOG_M3 ON delete_batch_log(TABLE_NAME,SCHEMA_NAME);
    

    运行删除数据操作

    脚本batch_delete_table.sh完成了以下任务

    1. 设置批量删除的并发度
    2. 连接MySQL查询出该表的最小主键ID和最大主键ID
    3. 基于最小主键ID和最大主键ID计算以每批次1万条记录的区间,需要执行多少次循环
    4. 将删除操作的会话级别设置为RR且binlog格式设置为statement减少binlog的写入量(减少IO压力及从库回放压力)
    5. 将每个批次的删除操作基本信息写入到日志表中,包含以下信息
    • 数据库名称
    • 数据表名称
    • 批次号
    • 该批次删除的记录数
    • 该批次的起始ID
    • 该批次的结束ID
    • 该批次删除的开始时间
    • 该批次删除的结束时间
    • 该批次删除是否存在错误(记录错误码)
    #!/bin/bash
    
    ## SET MySQL CONN INFO
    MYSQL_HOST=10.186.61.162
    MYSQL_USER=zhenxing
    MYSQL_PASS=zhenxing
    MYSQL_PORT=3306
    MYSQL_DB=demo
    BATCH_ROWS=10000
    MYSQL_TABLE=g_device_action
    PARALLEL_WORKERS=5
    
    ## Create Named pipe And File descriptor
    [ -e /tmp/fd1 ] || mkfifo /tmp/fd1
    exec 3<>/tmp/fd1
    rm -rf /tmp/fd1
    
    ## Set the parallel
    for ((i=1;i<= $PARALLEL_WORKERS;i++))
    do
        echo >&3
    done
    
    MINID=`mysql -sse "select min(id) from ${MYSQL_DB}.${MYSQL_TABLE};"`
    BATCH_TOTAL=`mysql -sse "select ceil((max(id)-min(id))/${BATCH_ROWS}) from ${MYSQL_DB}.${MYSQL_TABLE};"`
    
    ## PARALLEL LOAD DATA
    for ((i=1;i<=$BATCH_TOTAL;i++))
    do
    read -u3
    {
        BEGIN_RECORD=$[($i-1)*${BATCH_ROWS}+${MINID}]
        END_RECORD=$[($i-0)*${BATCH_ROWS}+${MINID}]
        mysql -h$MYSQL_HOST -u$MYSQL_USER -p$MYSQL_PASS -P$MYSQL_PORT << EOF
        set session transaction_isolation='REPEATABLE-READ';
        set session binlog_format='statement';
        -- set session sql_log_bin=off;
        set @BEGIN_TIME=now(6);
        select count(*),CONV(bit_xor(crc32(concat_ws('',id,uid,domain_id,machine,app_type,app_id,action_time,action_status,source,url))),10,16) into @row_count,@crc32_values from ${MYSQL_DB}.${MYSQL_TA
    BLE} where id>=${BEGIN_RECORD} and id<${END_RECORD} and action_time<1595222485;
        delete from ${MYSQL_DB}.${MYSQL_TABLE} where id>=${BEGIN_RECORD} and id<${END_RECORD} and action_time<1595222485;
        set @END_TIME=now(6);
    
        GET DIAGNOSTICS @p1=NUMBER,@p2=ROW_COUNT;
        insert into ${MYSQL_DB}.delete_batch_log(BATCH_ID,SCHEMA_NAME,TABLE_NAME,BATCH_COUNT,BEGIN_RECORD,END_RECORD,BEGIN_TIME,END_TIME,ERROR_NO,crc32_values) values (${i},'${MYSQL_DB}','${MYSQL
    _TABLE}',@row_count,${BEGIN_RECORD},${END_RECORD},@BEGIN_TIME,@END_TIME,@p1,@crc32_values);
    EOF
        echo >&3
    } &
    done
    wait
    
    exec 3<&-
    exec 3>&-
    

    删除后的收尾操作

    删除完成后可用以下SQL查看删除的汇总情况

    select SCHEMA_NAME,TABLE_NAME,min(BATCH_ID) as "最小批次",max(BATCH_ID) as "最大批次",sum(BATCH_COUNT) as "删除记录总数",min(BEGIN_TIME) as "开始时间",max(END_TIME) as "结束时间",TIMESTAMPDIFF(SECOND,min(BEGIN_TIME),max(END_TIME)) as "时间消耗(秒)" from delete_batch_log group by SCHEMA_NAME,TABLE_NAME;
    *************************** 1. row ***************************
           SCHEMA_NAME: demo
            TABLE_NAME: g_device_action
          最小批次: 1
          最大批次: 5415
    删除记录总数: 51534336
          开始时间: 2020-07-16 10:56:46.347799
          结束时间: 2020-07-16 11:00:29.617498
     时间消耗(秒): 223
    1 row in set (0.01 sec)
    

    大表通过以上方式删除大量数据后,磁盘表空间并不会释放,需要将表进行收缩,该操作根据表空间的大小执行时间不同,以当前测试环境为例,表空间大小为32G,删除了5000万数据,耗时约1分钟

    alter table g_device_action engine=innodb;
    

    删除5000万条记录,基于statement模式产生的binlog约20M

    校验命令

    select count(*),CONV(bit_xor(crc32(concat(id,uid,domain_id,machine,app_type,app_id,action_time,action_status,source,url))),10,16) as crc32_values  from g_device_action where id>=12520001 and id<12530001;
    

    MySQL 列转行

    set global group_concat_max_len=102400;
    set group_concat_max_len=102400;
    SELECT @@global.group_concat_max_len;
    SELECT @@group_concat_max_len;
    select table_name,concat(group_concat(COLUMN_NAME order by ORDINAL_POSITION separator ',')) as all_columns
    from information_schema.COLUMNS tb1
    where table_schema='demo'
    and table_name='g_device_action'
    group by table_name;
    
    转载请说明出处 |QQ:327488733@qq.com
  • 相关阅读:
    Navicat 连接MySQL 8.0.11 出现2059错误
    安全技术运营的心得
    浅谈命令混淆
    2021年度总结与2022新的展望
    域环境搭建之安装exchange
    内网ADCS攻防
    CVE202142287复现
    企业安全建设——安全防线框架建设(一)
    frp_v0.37.1内网穿透,内网服务公网用不求人
    WP7XNA 多点触摸
  • 原文地址:https://www.cnblogs.com/zhenxing/p/15102521.html
Copyright © 2011-2022 走看看