没事用mysql标准半同步、HA环境不要瞎折腾使用存储过程、触发器或galera cluster。
近期某个系统中的galera cluseter环境发生A DDL操作后,B节点未同步的情况,同时B节点的errorlog中有如下警告信息:
2016-07-23 17:31:32 18920 [Warning] WSREP: RBR event 1 Query apply warning: 1, 7866890
2016-07-23 17:31:32 18920 [Warning] WSREP: Ignoring error for TO isolated action: source: cc757ba3-4e3c-11e6-8893-8b18f5f1ec79 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 814 trx_id: -1 seqnos (l: 207, g: 7866890, s: 7866889, d: 7866889, ts: 5701897716366444)
2016-07-25 15:58:17 18920 [ERROR] Slave SQL: Error 'Unknown table 'hs_member.mem'' on query. Default database: 'hs_member'. Query: 'DROP TABLE `mem`', Error_code: 1051
2016-07-25 15:58:17 18920 [Warning] WSREP: RBR event 1 Query apply warning: 1, 7867308
2016-07-25 15:58:17 18920 [Warning] WSREP: Ignoring error for TO isolated action: source: cc757ba3-4e3c-11e6-8893-8b18f5f1ec79 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 1527 trx_id: -1 seqnos (l: 628, g: 7867308, s: 7867307, d: 7867307, ts: 5869102727335473)
根据官方所述:
SCHEMA UPGRADES
Any DDL statement that runs for the database, such as CREATE TABLE or GRANT, upgrades the schema. These DDL statements change the database itself and are non-transactional.
Galera Cluster processes schema upgrades in two different methods:
- Total Order Isolation (TOI) Where the schema upgrades run on all cluster nodes in the same total order sequence, preventing other transations from committing for the duration of the operation.
- Rolling Schema Upgrade (RSU) Where the schema upgrades run locally, affecting only the node on which they are run. The changes do not replicate to the rest of the cluster.
You can set the method for online schema upgrades by using the wsrep_OSU_method parameter in the configuration file, (my.ini or my.cnf, depending on your build) or through the MySQL client. Galera Cluster defaults to the Total Order Isolation method.
根据http://galeracluster.com/documentation-webpages/schemaupgrades.html的说明,在TOI模式下,galera cluster会等待当前未提交的事务提交,然后复制DDL到所有的节点确保一致性。
而事实上,根据实际的情况来看,Galera先复制DDL到了其他节点,并被执行,而此时本地尚未执行。如果本地节点因为某种原因比如长时间具有未提交的dml,则可能因为被取消或者超时而导致本地节点最后DDL执行失败。这就会出现galera cluster节点间schema不一致的情况。