zoukankan      html  css  js  c++  java
  • MySQL数据库分区操作【RANGE】

    客服平台,线上查询存在性能问题,为了解决或者说是缓解这个问题,除了加必要的索引,另外就是将表进行分区。

    这里主要是针对既有的表进行分区,采用的是alter table xxx的方式,当然,也可以采用create table xxx partition by range(abc)的方式,都是可以的。两种方式,都验证和测试过,都可行!这里主要介绍alter的方式!

    主要是因为alter的过程,遇到一点小小的问题,以备后查。

    通过show create table 的方式查看我们的chat_message_history表,结构如下:

    Table    Create Table
    chat_message_history    CREATE TABLE `chat_message_history` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `visitor_id` varchar(128) DEFAULT NULL,
      `visitor_name` varchar(255) DEFAULT NULL,
      `contentBlob` blob,
      `sender` varchar(32) DEFAULT NULL,
      `message_time` datetime DEFAULT NULL COMMENT '消息发送时间',
      `jobId` varchar(11) DEFAULT NULL,
      `robot_response` varchar(2000) DEFAULT NULL COMMENT '机器人的回复消息',
      `skill_group_id` varchar(32) DEFAULT NULL,
      `type` varchar(255) DEFAULT NULL,
      `new_skill_group_id` varchar(32) DEFAULT NULL,
      `channel` varchar(16) DEFAULT 'WXAPP' COMMENT '渠道',
      `message_id` varchar(64) DEFAULT NULL,
      `sessionId` varchar(50) DEFAULT NULL,
      `message_status` varchar(11) DEFAULT NULL,
      `error_message` varchar(255) DEFAULT NULL,
      `businessType` varchar(11) DEFAULT NULL COMMENT '1-欢迎语;',
      `pFlag` varchar(11) DEFAULT NULL COMMENT '消息的产品属性 1-微医保 2-微重疾',
      PRIMARY KEY (`id`),
      KEY `IDX_jobId` (`jobId`),
      KEY `idx_his_vis_ctdesc_key` (`visitor_id`,`skill_group_id`,`message_time`)
    ) DEFAULT CHARSET=utf8

    然后就是alter table的方式添加分区,分区按照消息时间,大体是每个月一个分区:

    alter table chat_message_history partition by range(to_days(message_time)) (
        partition p201708 values less than (to_days('2017-08-31')),
        partition p201709 values less than (to_days('2017-09-30')),
        partition p201710 values less than (to_days('2017-10-31')),    
        partition p201711 values less than (to_days('2017-11-30')),
        partition p201712 values less than (to_days('2017-12-31')),    
        partition p201801 values less than (to_days('2018-01-31')),     
        partition p201802 values less than (to_days('2018-02-30')),
        partition p201803 values less than (to_days('2018-03-31')),
        partition p201804 values less than (to_days('2018-04-30')),  
        partition p201805 values less than (to_days('2018-05-31')),
        partition p201806 values less than (to_days('2018-06-30')),  
        partition p201807 values less than (to_days('2018-07-31')),  
        partition p201808 values less than (to_days('2018-08-31')),  
        partition p201809 values less than (to_days('2018-09-30')),  
        partition p201810 values less than (to_days('2018-10-31')),  
        partition p201811 values less than (to_days('2018-11-30')),    
        partition p201812 values less than (to_days('2018-12-31')),  
        partition p201901 values less than (to_days('2019-01-31')),   
        partition p201902 values less than (to_days('2019-02-30')),
        partition p201903 values less than (to_days('2019-03-31')),  
        partition p201904 values less than (to_days('2019-04-30')),
        partition p201905 values less than (to_days('2019-05-31')),  
        partition p201906 values less than (to_days('2019-06-30')),
        partition p201907 values less than (to_days('2019-07-31')),  
        partition p201908 values less than (to_days('2019-08-31')),  
        partition p201909 values less than (to_days('2019-09-30')),
        partition p201910 values less than (to_days('2019-10-31')),    
        partition p201911 values less than (to_days('2019-11-30')),  
        partition p201912 values less than (to_days('2019-12-31')),   
        partition p202001 values less than (to_days('2020-01-31')),
        partition p202002 values less than (to_days('2020-02-30')),
        partition p202003 values less than (to_days('2020-03-31')),
        partition p202004 values less than (to_days('2020-04-30')),
        partition p202005 values less than (to_days('2020-05-31')),
        partition p202006 values less than (to_days('2020-06-30')),
        partition p202007 values less than (to_days('2020-07-31')),
        partition p202008 values less than (to_days('2020-08-31')),
        partition p202009 values less than (to_days('2020-09-30')),
        partition p202010 values less than (to_days('2020-10-31')),
        partition p202011 values less than (to_days('2020-11-30')),
        partition p202012 values less than (to_days('2020-12-31')),
        PARTITION p202XYZ VALUES LESS THAN (MAXVALUE));

    上述SQL执行报错:

    ERROR 1566 (HY000): Not allowed to use NULL value in VALUES LESS THAN

    仔细查看,上述的LESS THAN后面没有NULL的值啊,都是写的很明确的年月日进行获取天数来得到分界线的啊。。。 最后研究下to_days(expr)函数,

    官方文档:

    TO_DAYS(date)

    Given a date date, returns a day number (the number of days since year 0).

    我怀疑是因为我给定的每年的2月份的年月日信息不合法,验证一下:

    mysql> select TO_DAYS('2017-02-30');
    +-----------------------+
    | TO_DAYS('2017-02-30') |
    +-----------------------+
    |                  NULL |
    +-----------------------+
    1 row in set, 1 warning (0.00 sec)
    
    mysql> 
    mysql> show warnings;
    +---------+------+----------------------------------------+
    | Level   | Code | Message                                |
    +---------+------+----------------------------------------+
    | Warning | 1292 | Incorrect datetime value: '2017-02-30' |
    +---------+------+----------------------------------------+
    1 row in set (0.00 sec)

    结合上述错误提示,将分区SQL语句调整一下如下:

    alter table chat_message_history partition by range(to_days(message_time)) (
        partition p201708 values less than (to_days('2017-09-01')),
        partition p201709 values less than (to_days('2017-10-01')),
        partition p201710 values less than (to_days('2017-11-01')),    
        partition p201711 values less than (to_days('2017-12-01')),
        partition p201712 values less than (to_days('2018-01-01')),    
        partition p201801 values less than (to_days('2018-02-01')),     
        partition p201802 values less than (to_days('2018-03-01')),
        partition p201803 values less than (to_days('2018-04-01')),
        partition p201804 values less than (to_days('2018-05-01')),  
        partition p201805 values less than (to_days('2018-06-01')),
        partition p201806 values less than (to_days('2018-07-01')),  
        partition p201807 values less than (to_days('2018-08-01')),  
        partition p201808 values less than (to_days('2018-09-01')),  
        partition p201809 values less than (to_days('2018-10-01')),  
        partition p201810 values less than (to_days('2018-11-01')),  
        partition p201811 values less than (to_days('2018-12-01')),    
        partition p201812 values less than (to_days('2019-01-01')),  
        partition p201901 values less than (to_days('2019-02-01')),   
        partition p201902 values less than (to_days('2019-03-01')),
        partition p201903 values less than (to_days('2019-04-01')),  
        partition p201904 values less than (to_days('2019-05-01')),
        partition p201905 values less than (to_days('2019-06-01')),  
        partition p201906 values less than (to_days('2019-07-01')),
        partition p201907 values less than (to_days('2019-08-01')),  
        partition p201908 values less than (to_days('2019-09-01')),  
        partition p201909 values less than (to_days('2019-10-01')),
        partition p201910 values less than (to_days('2019-11-01')),    
        partition p201911 values less than (to_days('2019-12-01')),  
        partition p201912 values less than (to_days('2020-01-01')),   
        partition p202001 values less than (to_days('2020-02-01')),
        partition p202002 values less than (to_days('2020-03-01')),
        partition p202003 values less than (to_days('2020-04-01')),
        partition p202004 values less than (to_days('2020-05-01')),
        partition p202005 values less than (to_days('2020-06-01')),
        partition p202006 values less than (to_days('2020-07-01')),
        partition p202007 values less than (to_days('2020-08-01')),
        partition p202008 values less than (to_days('2020-09-01')),
        partition p202009 values less than (to_days('2020-10-01')),
        partition p202010 values less than (to_days('2020-11-01')),
        partition p202011 values less than (to_days('2020-12-01')),
        partition p202012 values less than (to_days('2021-01-01')),
        PARTITION p202XYZ VALUES LESS THAN (MAXVALUE));

    执行后还是报错:

    ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function

    这个错误是说,分区函数里面,主键必须包含所有的用于建立分区的列。我这里分区,是按照message_time进行分区,所以,这里将message_time和既有的id主键建立联合主键。SQL如下(先删除既有的id主键,再建联合主键):

    alter table chat_message_history drop primary key,add primary key (`id`,`message_time`); 

    再次执行创建分区的SQL:

    mysql> alter table chat_message_history partition by range(to_days(message_time)) (
        -> partition p201708 values less than (to_days('2017-09-01')),
        -> partition p201709 values less than (to_days('2017-10-01')),
        -> partition p201710 values less than (to_days('2017-11-01')),    
        -> partition p201711 values less than (to_days('2017-12-01')),
        -> partition p201712 values less than (to_days('2018-01-01')),    
        -> partition p201801 values less than (to_days('2018-02-01')),     
        -> partition p201802 values less than (to_days('2018-03-01')),
        -> partition p201803 values less than (to_days('2018-04-01')),
        -> partition p201804 values less than (to_days('2018-05-01')),  
        -> partition p201805 values less than (to_days('2018-06-01')),
        -> partition p201806 values less than (to_days('2018-07-01')),  
        -> partition p201807 values less than (to_days('2018-08-01')),  
        -> partition p201808 values less than (to_days('2018-09-01')),  
        -> partition p201809 values less than (to_days('2018-10-01')),  
        -> partition p201810 values less than (to_days('2018-11-01')),  
        -> partition p201811 values less than (to_days('2018-12-01')),    
        -> partition p201812 values less than (to_days('2019-01-01')),  
        -> partition p201901 values less than (to_days('2019-02-01')),   
        -> partition p201902 values less than (to_days('2019-03-01')),
        -> partition p201903 values less than (to_days('2019-04-01')),  
        -> partition p201904 values less than (to_days('2019-05-01')),
        -> partition p201905 values less than (to_days('2019-06-01')),  
        -> partition p201906 values less than (to_days('2019-07-01')),
        -> partition p201907 values less than (to_days('2019-08-01')),  
        -> partition p201908 values less than (to_days('2019-09-01')),  
        -> partition p201909 values less than (to_days('2019-10-01')),
        -> partition p201910 values less than (to_days('2019-11-01')),    
        -> partition p201911 values less than (to_days('2019-12-01')),  
        -> partition p201912 values less than (to_days('2020-01-01')),   
        -> partition p202001 values less than (to_days('2021-02-01')),
        -> partition p202002 values less than (to_days('2020-03-01')),
        -> partition p202003 values less than (to_days('2020-04-01')),
        -> partition p202004 values less than (to_days('2020-05-01')),
        -> partition p202005 values less than (to_days('2020-06-01')),
        -> partition p202006 values less than (to_days('2020-07-01')),
        -> partition p202007 values less than (to_days('2020-08-01')),
        -> partition p202008 values less than (to_days('2020-09-01')),
        -> partition p202009 values less than (to_days('2020-10-01')),
        -> partition p202010 values less than (to_days('2020-11-01')),
        -> partition p202011 values less than (to_days('2020-12-01')),
        -> partition p202012 values less than (to_days('2021-01-01')),
        -> PARTITION p202XYZ VALUES LESS THAN (MAXVALUE));
    Query OK, 0 rows affected (1.28 sec)
    Records: 0  Duplicates: 0  Warnings: 0

    这回成功了,真是折腾!!!

    现在就要来验证一下,我们的分区是否起到作用了。主要是进行对比呗,先看没有建立分区的SQL查询:

    mysql> explain select * from chat_message_history where message_time > '2017-12-01' and message_time < '2018-01-01';
    +----+-------------+----------------------+------+---------------+------+---------+------+---------+-------------+
    | id | select_type | table                | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
    +----+-------------+----------------------+------+---------------+------+---------+------+---------+-------------+
    |  1 | SIMPLE      | chat_message_history | ALL  | NULL          | NULL | NULL    | NULL | 5103176 | Using where |
    +----+-------------+----------------------+------+---------------+------+---------+------+---------+-------------+
    1 row in set (0.00 sec)

    涉及到表扫描行数是5103176,这个表一共530W行记录,这里就扫描了510W行,够可以的。。。

    那么,加了分区后呢?请看下面的SQL查询:

    mysql> explain select * from chat_message_history where message_time > '2017-12-01' and message_time < '2018-01-01';
    +----+-------------+----------------------+------+---------------+------+---------+------+--------+-------------+
    | id | select_type | table                | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
    +----+-------------+----------------------+------+---------------+------+---------+------+--------+-------------+
    |  1 | SIMPLE      | chat_message_history | ALL  | NULL          | NULL | NULL    | NULL | 829848 | Using where |
    +----+-------------+----------------------+------+---------------+------+---------+------+--------+-------------+
    1 row in set (0.00 sec)

    这回查询扫描的行数,就变成了80多万行了,少了不少啊!

    从这次分区看,分区查询和不分区查询,影响到的扫描行数还是挺明显的。

    总结一下:

    1,MySQL数据量达到几百万后,多表联合查询时,性能极其不稳定,这个是我们线上系统的真实写照,几天内,两次查询导致数据库连接数耗尽,这次600个连接,全部占用,导致系统不可用!

    2,数据量大了,采用分区,或者加索引,可以缓解眼前的问题,但是,随着时间推移,若查询数据量不做限制,最终还是会出现查询响应非常慢的问题。所以,建议采用数据分割或者说是表拆分的方式,基于一定的业务场景或者需要进行,可以保证系统的高可用性。

  • 相关阅读:
    设计模式浅谈
    链表的遍历(1)
    链表的删除(3)
    链表结构的反转(5)
    二叉树数组表示法
    循环链表的插入和删除
    链表的链接(2)
    双向链表内结点的删除(4)
    hdu1042
    数组和链表的区别
  • 原文地址:https://www.cnblogs.com/shihuc/p/8748962.html
Copyright © 2011-2022 走看看