参数binlog-row-event-max-size:
Specify the maximum size of a row-based binary log event, in bytes. Rows are grouped into events smaller than this size if possible. The value should be a multiple of 256. The default is 8192。
Demo:
UPDATE tb001 SET C1=3 LIMIT 10000; Query OK, 10000 rows affected (0.03 sec) Rows matched: 10000 Changed: 10000 Warnings: 0 SHOW BINLOG EVENTS; +------------------+--------+----------------+-----------+-------------+--------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | | mysql-bin.000001 | 401 | Update_rows | 168199193 | 8609 | table_id: 219| | mysql-bin.000001 | 8609 | Update_rows | 168199193 | 16817 | table_id: 219| | mysql-bin.000001 | 16817 | Update_rows | 168199193 | 25025 | table_id: 219| | mysql-bin.000001 | 25025 | Update_rows | 168199193 | 33233 | table_id: 219| | mysql-bin.000001 | 33233 | Update_rows | 168199193 | 41441 | table_id: 219| | mysql-bin.000001 | 41441 | Update_rows | 168199193 | 49649 | table_id: 219| | mysql-bin.000001 | 49649 | Update_rows | 168199193 | 57857 | table_id: 219| | mysql-bin.000001 | 57857 | Update_rows | 168199193 | 66065 | table_id: 219| | mysql-bin.000001 | 66065 | Update_rows | 168199193 | 74273 | table_id: 219| | mysql-bin.000001 | 74273 | Update_rows | 168199193 | 82481 | table_id: 219| | mysql-bin.000001 | 82481 | Update_rows | 168199193 | 90689 | table_id: 219| | mysql-bin.000001 | 90689 | Update_rows | 168199193 | 98897 | table_id: 219| | mysql-bin.000001 | 98897 | Update_rows | 168199193 | 107105 | table_id: 219| | mysql-bin.000001 | 107105 | Update_rows | 168199193 | 115313 | table_id: 219| | mysql-bin.000001 | 115313 | Update_rows | 168199193 | 123521 | table_id: 219| | mysql-bin.000001 | 123521 | Update_rows | 168199193 | 131729 | table_id: 219| | mysql-bin.000001 | 131729 | Update_rows | 168199193 | 139937 | table_id: 219| | mysql-bin.000001 | 139937 | Update_rows | 168199193 | 148145 | table_id: 219| | mysql-bin.000001 | 148145 | Update_rows | 168199193 | 156353 | table_id: 219| | mysql-bin.000001 | 156353 | Update_rows | 168199193 | 164561 | table_id: 219| | mysql-bin.000001 | 164561 | Update_rows | 168199193 | 172769 | table_id: 219| | mysql-bin.000001 | 172769 | Update_rows | 168199193 | 180977 | table_id: 219| | mysql-bin.000001 | 180977 | Update_rows | 168199193 | 181229 | table_id: 219| +------------------+--------+----------------+-----------+-------------+--------------+
在上面的Demo中,基于ROW模式的复制,更新10000条数据,生产约181K的binlog日志,MySQL将这些日志按照每个最大8K的拆分成23个binlog event,每个binlog event大概包含400-450个行记录的更新日志。
通过mysqlbinlog命令解析binlog文件,能在该事务中发现如下信息:
# at 293 #190514 9:34:23 server id 168199193 end_log_pos 350 CRC32 0x8889b84a Rows_query # UPDATE tb001 SET C1=3 LIMIT 10000 # at 350 #190514 9:34:23 server id 168199193 end_log_pos 401 CRC32 0xb30edd76 Table_map: `demodb`.`tb001` mapped to number 219 # at 401 #190514 9:34:23 server id 168199193 end_log_pos 8609 CRC32 0x45274f17 Update_rows: table id 219 # at 8609 #190514 9:34:23 server id 168199193 end_log_pos 16817 CRC32 0xb1b6f3b9 Update_rows: table id 219 # at 16817 #190514 9:34:23 server id 168199193 end_log_pos 25025 CRC32 0xfb7c22c2 Update_rows: table id 219 # at 25025 #190514 9:34:23 server id 168199193 end_log_pos 33233 CRC32 0xd3ef86dc Update_rows: table id 219 # at 33233 #190514 9:34:23 server id 168199193 end_log_pos 41441 CRC32 0x031b968c Update_rows: table id 219 # at 41441 #190514 9:34:23 server id 168199193 end_log_pos 49649 CRC32 0x5c0b1de9 Update_rows: table id 219 # at 49649 #190514 9:34:23 server id 168199193 end_log_pos 57857 CRC32 0x6a548038 Update_rows: table id 219 # at 57857 #190514 9:34:23 server id 168199193 end_log_pos 66065 CRC32 0xcd46570c Update_rows: table id 219 # at 66065 #190514 9:34:23 server id 168199193 end_log_pos 74273 CRC32 0x78d5b7e6 Update_rows: table id 219 # at 74273 #190514 9:34:23 server id 168199193 end_log_pos 82481 CRC32 0x8ee1a24c Update_rows: table id 219 # at 82481 #190514 9:34:23 server id 168199193 end_log_pos 90689 CRC32 0x4f3d8afd Update_rows: table id 219 # at 90689 #190514 9:34:23 server id 168199193 end_log_pos 98897 CRC32 0x767a8ad4 Update_rows: table id 219 # at 98897 #190514 9:34:23 server id 168199193 end_log_pos 107105 CRC32 0x8bd1ed97 Update_rows: table id 219 # at 107105 #190514 9:34:23 server id 168199193 end_log_pos 115313 CRC32 0x33840a78 Update_rows: table id 219 # at 115313 #190514 9:34:23 server id 168199193 end_log_pos 123521 CRC32 0x20101fea Update_rows: table id 219 # at 123521 #190514 9:34:23 server id 168199193 end_log_pos 131729 CRC32 0x76f4d551 Update_rows: table id 219 # at 131729 #190514 9:34:23 server id 168199193 end_log_pos 139937 CRC32 0x5a05d397 Update_rows: table id 219 # at 139937 #190514 9:34:23 server id 168199193 end_log_pos 148145 CRC32 0x12fcb52e Update_rows: table id 219 # at 148145 #190514 9:34:23 server id 168199193 end_log_pos 156353 CRC32 0x425a537b Update_rows: table id 219 # at 156353 #190514 9:34:23 server id 168199193 end_log_pos 164561 CRC32 0x8180d65b Update_rows: table id 219 # at 164561 #190514 9:34:23 server id 168199193 end_log_pos 172769 CRC32 0xe913c068 Update_rows: table id 219 # at 172769 #190514 9:34:23 server id 168199193 end_log_pos 180977 CRC32 0x1a2c0483 Update_rows: table id 219 # at 180977 #190514 9:34:23 server id 168199193 end_log_pos 181229 CRC32 0x4e561c1d Update_rows: table id 219 flags: STMT_END_F
对于上千万行的超大事务,可以拆分成通过几千或几万个binlog event,以每个binlog event为单位进行处理,能有效降低超大事务的影响。
==========================================================
对于使用程序解析抽取binlog的业务,由于binlog中记录的是已提交的修改数据,如果不考虑数据一致性或对数据一致性要求较低的场景下,可以考虑将超大事务按照binlog event进行拆分,按照每个binlog event进行事务提交。
==========================================================