zoukankan html css js c++ java

mysql表分区实战

一，什么是数据库分区
以mysql为例，mysql数据库中的数据是以文件的形势存在磁盘上的，默认放在/mysql/data下面（可以通过my.cnf中的datadir来查看），一张表主要对应着三个文件，一个是frm存放表结构的，一个是myd存放表数据的，一个是myi存表索引的。如果一张表的数据量太大的话，那么myd、myi就会变的很大，查找数据就会变的很慢，这个时候我们可以利用mysql的分区功能，在物理上将这一张表对应的三个文件，分割成许多个小块，这样呢，我们查找一条数据时，就不用全部查找了，只要知道这条数据在哪一块，然后在那一块找就行了。如果表的数据太大，可能一个磁盘放不下，这个时候，我们可以把数据分配到不同的磁盘里面去。

二、分区的二种方式
1，横向分区
什么是横向分区呢？就是横着来分区了，举例来说明一下，假如有100W条数据，分成十份，前10W条数据放到第一个分区，第二个10W条数据放到第二个分区，依此类推。也就是把表分成了十分，跟用merge来分表，有点像哦。取出一条数据的时候，这条数据包含了表结构中的所有字段，也就是说横向分区，并没有改变表的结构。

2，纵向分区
什么是纵向分区呢？就是竖着来分区了，举例来说明，在设计用户表的时候，开始的时候没有考虑好，而把个人的所有信息都放到了一张表里面去，这样这个表里面就会有比较大的字段，如个人简介，而这些简介呢，也许不会有好多人去看，所以等到有人要看的时候，在去查找，分表的时候，可以把这样的大字段，分开来。

感觉数据库的分区好像是切苹果，到底是横着切呢，还是竖着切，根据个人喜好了，mysql提供的分区属于第一种，横向分区，并且细分成很多种方式。下面将举例说明一下。

三、表分区
a.range分区
按照range分区的表是通过如下一种方式进行分区的，每个分区包含那些分区表达式的值位于一个给定的连续区间内的行

//创建range分区表

mysql> create table if not exists `user` (  
    `id` int(11) not null auto_increment comment '用户id',  
    `name` varchar(50) not null default '' comment '名称',  
    `sex` int(1) not null default '0' comment '0为男，1为女',  
    primary key (`id`)  
) engine=myisam default charset=utf8 auto_increment=1  

partition by range (id) (  
    partition p0 values less than (3),  
    partition p1 values less than (6),  
    partition p2 values less than (9),  
    partition p3 values less than (12),  
    partition p4 values less than maxvalue  
);

//查看表分区信息

mysql> select * from information_schema.partitions where table_schema='liying_order' and table_name='user';

可以看到user表有p0~p4 5个表分区

//插入一些数据

mysql> insert into `user` (`name` ,`sex`) values 
('tank', '0') ,('zhang',1),('ying',1),('张',1),
('映',0),('test1',1),('tank2',1),('tank1',1),
('test2',1),('test3',1),('test4',1),('test5',1),
('tank3',1),('tank4',1),('tank5',1),('tank6',1),
('tank7',1),('tank8',1),('tank9',1),('tank10',1),
('tank11',1),('tank12',1),('tank13',1),('tank21',1),('tank42',1);

//到存放数据库表文件的地方看一下，如：D:divMySQLMySQL Server 5.6dataliying_order

//取出数据
mysql> select count(id) as count from user;
+-------+
| count |
+-------+
| 25 |
+-------+

//删除第四个分区
mysql> alter table user drop partition p4;

//查看表分区信息
mysql> select * from information_schema.partitions where table_schema='liying_order' and table_name='user';

可以看到user表有p0~p3 4个表分区

mysql> select count(id) as count from user;
+-------+
| count |
+-------+
| 11 |
+-------+
注意：存放在分区里面的数据丢失了，p4分区里面有14条数据，其他分区只有11条数据。

//第四个区块已删除

可以对现有表进行分区,并且会按規则自动的将表中的数据分配相应的分区中，这样就比较好了，可以省去很多事情，看下面的操作

mysql> alter table user partition by range(id) (
    partition p1 values less than (1),  
    partition p2 values less than (5),  
    partition p3 values less than maxvalue
);

//查看表分区信息
mysql> select * from information_schema.partitions where table_schema='liying_order' and table_name='user';

可以看到user表有p1~p3 3个表分区

mysql> select count(*) from user;
+----------+
| count(*) |
+----------+
| 11 |
+----------+

//查看重新整理后的表分区

b.list分区

list分区中每个分区的定义和选择是基于某列的值从属于一个值列表集中的一个值，而range分区是从属于一个连续区间值的集合。

//这种方式失败  
mysql> create table if not exists `list_part` (  
    `id` int(11) not null auto_increment comment '用户id',  
    `province_id` int(2) not null default 0 comment '省',  
    `name` varchar(50) not null default '' comment '名称',  
    `sex` int(1) not null default '0' comment '0为男，1为女',  
    primary key (`id`)  
) engine=innodb  default charset=utf8 auto_increment=1  

partition by list (province_id) (  
      partition p0 values in (1,2,3,4,5,6,7,8),  
      partition p1 values in (9,10,11,12,16,21),  
      partition p2 values in (13,14,15,19),  
      partition p3 values in (17,18,20,22,23,24)  
);  
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function 

//这种方式成功 
mysql> create table if not exists `list_part` ( 
    `id` int(11) not null  comment '用户id', 
    `province_id` int(2) not null default 0 comment '省', 
    `name` varchar(50) not null default '' comment '名称', 
    `sex` int(1) not null default '0' comment '0为男，1为女'  
) engine=innodb  default charset=utf8  

partition by list (province_id) (  
      partition p0 values in (1,2,3,4,5,6,7,8),  
      partition p1 values in (9,10,11,12,16,21),  
      partition p2 values in (13,14,15,19),  
      partition p3 values in (17,18,20,22,23,24)  
);  
Query OK, 0 rows affected (0.33 sec)

上面的这个创建list分区时，如果有主銉的话，分区时主键必须在其中，不然就会报错。如果我不用主键，分区就创建成功了，一般情况下，一张表肯定会有一个主键，这算是一个分区的局限性吧。

如果对数据进行测试，请参考range分区的测试来操作

c.hash分区
hash分区主要用来确保数据在预先确定数目的分区中平均分布，你所要做的只是基于将要被哈希的列值指定一个列值或表达式，以及指定被分区的表将要被分割成的分区数量。

mysql> create table if not exists `hash_part` (  
    `id` int(11) not null auto_increment comment '评论id',  
    `comment` varchar(1000) not null default '' comment '评论',  
    `ip` varchar(25) not null default '' comment '来源ip',  
    primary key (`id`)  
) engine=innodb  default charset=utf8 auto_increment=1

partition by hash(id)  
partitions 3;  
Query OK, 0 rows affected (0.06 sec)

测试请参考range分区的操作。

d.key分区
按照key进行分区类似于按照hash分区，除了hash分区使用的用户定义的表达式，而key分区的哈希函数是由mysql 服务器提供。

mysql> create table if not exists `key_part` (  
    `news_id` int(11) not null  comment '新闻id',  
    `content` varchar(1000) not null default '' comment '新闻内容',  
    `u_id` varchar(25) not null default '' comment '来源ip',  
    `create_time` date not null default '0000-00-00 00:00:00' comment '时间'  
) engine=innodb  default charset=utf8  

partition by linear hash(year(create_time))  
partitions 3;  
Query OK, 0 rows affected (0.07 sec)

测试请参考range分区的操作。

四，分区管理

1，分区移除/删除分区

alter table tablename remove partitioning;移除全部分区，不会删除数据
alter table tablename drop partition partitionname 删除分区同时删除数据


mysql> alter table user drop partition p4;  删除p4分区，同时删除分区数据

2，新增分区
//range添加新分区  
mysql> alter table user add partition(partition p4 values less than maxvalue);  
query ok, 0 rows affected (0.06 sec)  
records: 0  duplicates: 0  warnings: 0

新增分区，已分区的基础上才能进行新增分区
mysql> alter table dd add partition (
partition p04 values less than (to_days('2018-08-08'))
)

　　添加分区
　　alter table `lot_order_vjoptr-0` add partition(partition pmax values less than maxvalue);
　　删除所有分区
　　alter table lot_order remove partitioning;

  
//list添加新分区  
mysql> alter table list_part add partition(partition p4 values in (25,26,28));  
query ok, 0 rows affected (0.01 sec)  
records: 0  duplicates: 0  warnings: 0  
  
//hash重新分区  
mysql> alter table hash_part add partition partitions 4;  
query ok, 0 rows affected (0.12 sec)  
records: 0  duplicates: 0  warnings: 0  
  
//key重新分区  
mysql> alter table key_part add partition partitions 4;  
query ok, 1 row affected (0.06 sec)    //有数据也会被重新分配  
records: 1  duplicates: 0  warnings: 0  
  
3，重新分区
//range重新分区  
mysql> alter table user reorganize partition p0,p1,p2,p3,p4 into (partition p0 values less than maxvalue);  
query ok, 11 rows affected (0.08 sec)  
records: 11  duplicates: 0  warnings: 0  
  
//list重新分区  
mysql> alter table list_part reorGANIZE PARTITION p0,p1,p2,p3,p4 INTO (PARTITION p0 VALUES in (1,2,3,4,5));  
Query OK, 0 rows affected (0.28 sec)  
Records: 0  Duplicates: 0  Warnings: 0  
  
//hash和key分区不能用REORGANIZE，官方网站说的很清楚  
mysql> ALTER TABLE key_part REORGANIZE PARTITION COALESCE PARTITION 9;  
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'PARTITION 9' at line 1

五、分区优点
1，分区可以分在多个磁盘，存储更大一点
2，根据查找条件，也就是where后面的条件，查找只查找相应的分区不用全部查找了
3，进行大数据搜索时可以进行并行处理。
4，跨多个磁盘来分散数据查询，来获得更大的查询吞吐量

六、分区允许的列函数

day()
dayofmonth()
dayofweek()
dayofyear()
datediff()
extract()
hour()
microsecond()
minute()
mod()
month()
quarter()
second()
time_to_sec()
to_days()
weekday()
year()
yearweek()

查看全文

相关阅读:
数据仓库中的几种数据模型
 数据仓库为什么要分层
 数据仓库的两种建模方法
 数据仓库之架构发展
 数据仓库是什么
 ETL 自动化测试框架
 大数据测试之ETL测试工具和面试常见的问题及答案
 Hadoop面试链接
 Hadoop 面试总结
 Spark 基本架构及原理

原文地址：https://www.cnblogs.com/linjiqin/p/9122916.html