zoukankan      html  css  js  c++  java
  • gh-ost解析

    gh—ost是github的dba开源一款使用go语言开发的MySQL在线改表工具,解决了目前采用pt-online-schema-change遇到的一些问题,思路也很新颖;作者很厉害,也是是openark kit工具集的作者(主要是用python写的一套工具集)。

    为什么叫gh-ost而不是叫gh-osc,创造了一个新的缩写,gh-ost可以发音为ghost,t可以解读为Transmogrifier/Translator/Transformer/Transfigurator。

    相关链接:

    1. github地址
    2. 作者介绍gh-ost的博客
    3. GitHub为MySQL社区贡献了新的在线更改表定义工具gh-ost

    原理

    抛弃了pt-online-schema-change使用trigger来同步增量数据的方法,而通过模拟slave获取row格式的binlog的方式来获取增量数据,具体的数据流图可以看下图。

    数据流图

    使用tirgger的方式有很多缺点,作者在triggerless-design里面做了详尽的阐述,对与我们在工作中遇到的具体问题主要是MDL锁的问题、死锁的问题,这两个问题在使用pt-online-schema-change的时候没有办法绕过去。我们线上在使用pt-osc的时候遇到过多起因为MDL锁和死锁导致的问题,具体的DML锁和死锁问题在其他文章里面做详尽的阐述。

    因为trigger的方式,trigger触发的语句和被触发的语句在一个事务中;所以当时看到这种方式的第一反应就是会不会存在时序的问题(binlog和copy数据产生的时序问题);经过和朋友的讨论以及推理之后发现不会存在问题。

    因为binlog中记录的是full image,所以binlog中的数据是最权威的,而且读取的binlog在应用的时候做了如下转化,而且copy old data是insert ignore,因此会以binlog的优先级为最高,因此不会有问题。

    源类型目标类型
    insert replace
    update update
    delete delete

    对与insert和update是没有问题的,因为无论copy old row和apply binlog的先后顺序,如果apply binlog在后,会覆盖掉copy old row,如果apply binlog在前面,copy old row因为使用insert ignore,因此会被ignore掉;

    对与delete数据,我们可以演算一下,abc三个操作,可能存在三种情况(b肯定在a的后面):

    • a.delete old row
    • b.delete binlog apply
    • c.copy old row
    1. cab,c会将数据copy到ghost表,最后b会把ghost表中的数据delete掉;
    2. acb,c空操作,b也是空操作;
    3. abc,b空操作,c也是空操作;

    详细过程

    一、通过count(*)来获取这个表的准确数据量,或者explain来获取大概值,用于计算进度;通过exact-rowcount参数来决定,默认方式是通过explain。

    --方法1:通过explain来获取大概的数据
    explain select /* gh-ost */ * from %s.%s where 1=1;
    --方法2:通过count(*)获取一个准确值
    select /* gh-ost */ count(*) as rows from %s.%s;
    

    二、通过order by分别获取最大值和最小值范围

    SELECT /* gh-ost `test`.`test` */ `id` 
    FROM   `test`.`test` 
    ORDER  BY `id` ASC 
    LIMIT  1; 
    
    SELECT /* gh-ost `test`.`test` */ `id` 
    FROM   `test`.`test` 
    ORDER  BY `id` DESC 
    LIMIT  1; 
    

    三、通过一个下面的sql不断的探测是否还有数据需要copy

    SELECT /* gh-ost `test`.`faredb_detail` iteration:25 */ `faredb_detail_id` 
    FROM   (SELECT `faredb_detail_id` 
            FROM   `test`.`faredb_detail` 
            WHERE  (( `faredb_detail_id` > 118887 )) 
                   AND ( ( `faredb_detail_id` < '105070846' ) 
                          OR (( `faredb_detail_id` = '105070846' )) ) 
            ORDER  BY `faredb_detail_id` ASC 
            LIMIT  1000) select_osc_chunk 
    ORDER  BY `faredb_detail_id` DESC 
    LIMIT  1; 
    

    四、通过下面的SQL Copy原始表的数据

    insert ignore into new_table select from old_table force index (`PRIMARY`) where  lock in share mode
    

    五、通过go/binlog/gomysql_reader.go读取binlog,然后通过go/logic/applier.go将读取的binlog进行转化写入到ghost表中;

    通过buildDMLEventQuery可以看到binlog的转化细节;

    源类型目标类型
    insert replace
    update update
    delete delete

    六、cut-over阶段(即rename表阶段),作者连续写了三篇blog来描述,比较复杂,有兴趣的可以阅读,1.Solving the Facebook-OSC non-atomic table swap problem2.Solving the non-atomic table swap, Take II3.Solving the non-atomic table swap, Take III: making it atomic;

    六、详细可以参照下面的日志

    2016-08-12 12:00:31 INFO starting gh-ost 1.0.8
    2016-08-12 12:00:31 INFO Migrating `test`.`sbtest`
    2016-08-12 12:00:31 INFO connection validated on test02:5002
    2016-08-12 12:00:31 INFO User has ALL privileges
    2016-08-12 12:00:31 INFO binary logs validated on test02:5002
    2016-08-12 12:00:31 INFO Restarting replication on test02:5002 to make sure binlog settings apply to replication thread
    2016-08-12 12:00:31 INFO Table found. Engine=InnoDB
    2016-08-12 12:00:31 INFO Estimated number of rows via EXPLAIN: 9936
    2016-08-12 12:00:31 INFO Master found to be test01:5002
    2016-08-12 12:00:31 INFO connection validated on test02:5002
    2016-08-12 12:00:31 INFO Registering replica at test02:5002
    2016-08-12 12:00:31 INFO Connecting binlog streamer at mysql-bin.000023:954323927
    2016-08-12 12:00:31 INFO rotate to next log name: mysql-bin.000023
    2016-08-12 12:00:31 INFO connection validated on test01:5002
    2016-08-12 12:00:31 INFO connection validated on test01:5002
    2016-08-12 12:00:31 INFO Droppping table `test`.`_sbtest_gho`
    2016-08-12 12:00:31 INFO Table dropped
    2016-08-12 12:00:31 INFO Droppping table `test`.`_sbtest_ghc`
    2016-08-12 12:00:31 INFO Table dropped
    2016-08-12 12:00:31 INFO Creating changelog table `test`.`_sbtest_ghc`
    2016-08-12 12:00:32 INFO Changelog table created
    2016-08-12 12:00:32 INFO Creating ghost table `test`.`_sbtest_gho`
    2016-08-12 12:00:32 INFO Ghost table created
    2016-08-12 12:00:32 INFO Altering ghost table `test`.`_sbtest_gho`
    2016-08-12 12:00:32 INFO Ghost table altered
    2016-08-12 12:00:32 INFO Chosen shared unique key is PRIMARY
    2016-08-12 12:00:32 INFO Shared columns are id,k,c,pad
    2016-08-12 12:00:32 INFO Listening on unix socket file: /tmp/gh-ost.test.sbtest.sock
    2016-08-12 12:00:32 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
    2016-08-12 12:00:32 INFO Exact number of rows via COUNT: 10000
    2016-08-12 12:00:32 INFO Migration min values: [1]
    2016-08-12 12:00:32 INFO Migration max values: [10000]
    # Migrating `test`.`sbtest`; Ghost table is `test`.`_sbtest_gho`
    # Migrating test01.dx.sankuai.com:5002; inspecting test02.dx.sankuai.com:5002; executing on dx-dba01.dx.sankuai.com
    # Migration started at Fri Aug 12 12:00:31 +0800 2016
    # chunk-size: 1000; max-lag-millis: 1500ms; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
    # throttle-additional-flag-file: /tmp/gh-ost.throttle
    # postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
    # panic-flag-file: /tmp/ghost.panic.flag
    # Serving on unix socket: /tmp/gh-ost.test.sbtest.sock
    Copy: 0/10000 0.0%; Applied: 0; Backlog: 0/100; Time: 0s(total), 0s(copy); streamer: mysql-bin.000023:954326679; ETA: N/A
    Copy: 0/10000 0.0%; Applied: 0; Backlog: 0/100; Time: 1s(total), 1s(copy); streamer: mysql-bin.000023:954328041; ETA: N/A
    2016-08-12 12:00:33 INFO Row copy complete
    Copy: 10000/10000 100.0%; Applied: 0; Backlog: 0/100; Time: 1s(total), 1s(copy); streamer: mysql-bin.000023:954831040; ETA: 0s
    2016-08-12 12:00:33 INFO Grabbing voluntary lock: gh-ost.84092.lock
    2016-08-12 12:00:33 INFO Setting LOCK timeout as 6 seconds
    2016-08-12 12:00:33 INFO Looking for magic cut-over table
    2016-08-12 12:00:33 INFO Creating magic cut-over table `test`.`_sbtest_del`
    2016-08-12 12:00:33 INFO Magic cut-over table created
    2016-08-12 12:00:33 INFO Locking `test`.`sbtest`, `test`.`_sbtest_del`
    2016-08-12 12:00:33 INFO Tables locked
    2016-08-12 12:00:33 INFO Session locking original & magic tables is 84092
    2016-08-12 12:00:33 INFO Writing changelog state: AllEventsUpToLockProcessed
    2016-08-12 12:00:33 INFO Waiting for events up to lock
    Copy: 10000/10000 100.0%; Applied: 0; Backlog: 1/100; Time: 2s(total), 1s(copy); streamer: mysql-bin.000023:954959331; ETA: 0s
    2016-08-12 12:00:34 INFO Done waiting for events up to lock; duration=971.748469ms
    # Migrating `test`.`sbtest`; Ghost table is `test`.`_sbtest_gho`
    # Migrating test01.dx.sankuai.com:5002; inspecting test02.dx.sankuai.com:5002; executing on dx-dba01.dx.sankuai.com
    # Migration started at Fri Aug 12 12:00:31 +0800 2016
    # chunk-size: 1000; max-lag-millis: 1500ms; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
    # throttle-additional-flag-file: /tmp/gh-ost.throttle
    # postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
    # panic-flag-file: /tmp/ghost.panic.flag
    # Serving on unix socket: /tmp/gh-ost.test.sbtest.sock
    Copy: 10000/10000 100.0%; Applied: 0; Backlog: 0/100; Time: 2s(total), 1s(copy); streamer: mysql-bin.000023:954960101; ETA: 0s
    2016-08-12 12:00:34 INFO Setting RENAME timeout as 3 seconds
    2016-08-12 12:00:34 INFO Session renaming tables is 84090
    2016-08-12 12:00:34 INFO Issuing and expecting this to block: rename /* gh-ost */ table `test`.`sbtest` to `test`.`_sbtest_del`, `test`.`_sbtest_gho` to `test`.`sbtest`
    2016-08-12 12:00:34 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
    2016-08-12 12:00:34 INFO Checking session lock: gh-ost.84092.lock
    2016-08-12 12:00:34 INFO Connection holding lock on original table still exists
    2016-08-12 12:00:34 INFO Will now proceed to drop magic table and unlock tables
    2016-08-12 12:00:34 INFO Dropping magic cut-over table
    2016-08-12 12:00:34 INFO Releasing lock from `test`.`sbtest`, `test`.`_sbtest_del`
    2016-08-12 12:00:34 INFO Tables unlocked
    2016-08-12 12:00:34 INFO Tables renamed
    2016-08-12 12:00:34 INFO Lock & rename duration: 981.435405ms. During this time, queries on `sbtest` were blocked
    2016-08-12 12:00:34 INFO Looking for magic cut-over table
    2016-08-12 12:00:34 INFO Droppping table `test`.`_sbtest_ghc`
    2016-08-12 12:00:34 INFO Table dropped
    2016-08-12 12:00:34 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
    2016-08-12 12:00:34 INFO -- drop table `test`.`_sbtest_del`
    2016-08-12 12:00:34 INFO Done migrating `test`.`sbtest`
    2016-08-12 12:00:34 INFO Done
    
    

    重要的一些选项以及典型使用方式

    重要的一些选项:

    -critical-load --max-load
            Comma delimited status-name=threshold, same format as --max-load. When status exceeds threshold, app panics and quits
    
    改表完成之后是否删除老表。        
    -ok-to-drop-table
            Shall the tool drop the old table at end of operation. DROPping tables can be a long locking operation, which is why I'm not doing it by default. I'm an online tool, yes?     
            
    如果执行之前发现old-table,删除还是终止?默认终止。
    -initially-drop-old-table
            Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists           
    
    port选项仅对应的是host吗?其他地方都需要设置全
    -port int
            MySQL port (preferably a replica, not the master) (default 3306)
    
    因为gh-ost没有提供recursion-method=processlist方法,因此需要通过throttle-control-replicas指定所有的需要检查的slave,并且注意上面的port仅仅对应于host选项,因此需要host:port的方式来写全称
    -throttle-control-replicas string
        	List of replicas on which to check for lag; comma delimited. Example: myhost1.com:3306,myhost2.com,myhost3.com:3307
        	
    拷贝每个chunk之后sleep的时间=nice-ratio*copy-chunk-time,默认值是0,表示不sleep
    -nice-ratio float
        	force being 'nice', imply sleep time per chunk time; range: [0.0..100.0]. Example values: 0 is aggressive. 1.5: for every ms spend in a rowcopy chunk, spend 1.5ms sleeping immediately after    
      
    默认是不删除socket文件的,这样当第二次运行的时候报错,提示socket文件已经存在 
    -initially-drop-socket-file
        	Should gh-ost forcibly delete an existing socket file. Be careful: this might drop the socket file of a running migration!    	
    -exact-rowcount
        	actually count table rows as opposed to estimate them (results in more accurate progress estimation)	
    

    典型使用方式:

    ./gh-ost 
    --max-load=Threads_running=25 
    --critical-load=Threads_running=64 
    --chunk-size=1000 
    --throttle-control-replicas="test02:3306" 
    --max-lag-millis=1500 
    --initially-drop-old-table 
    --initially-drop-ghost-table 
    --initially-drop-socket-file 
    --ok-to-drop-table 
    --conf="./my.cnf" 
    --host="test02" 
    --port=3306 
    --user="admin" 
    --password="admin" 
    --database="test" 
    --table="test" 
    --verbose 
    --alter="drop index id1" 
    --switch-to-rbr 
    --allow-master-master 
    --cut-over=default 
    --default-retries=120 
    --panic-flag-file=/tmp/ghost.panic.flag 
    --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag 
    --execute
    

    发现的一些不足:

    1. 相比pt-osc缺少–check-interval参数;目前这个值写死是1s;issue地址默认是1s,足够小了,作者认为当前可以;
    2. –conf选项设置的my.cnf文件中不支持设置prompt=[h]u@d :m:s>;issue地址,作者已经修复;
    3. 缺少增加unique index的时候的检查;如果增加unique index的时候会丢失数据;(pt-osc也存在这个问题);作者认为不需要修复,问题的解决难度也比较大,DBA在使用前需要注意;
    4. 相比pt-osc缺少–check-replication-filters;是否确实对从库是否存在这个表的检查
    5. 相比pt-osc缺少–recursion-method=processlist;需要通过参数进行设置,issue地址

    问题:

    1.相比pt-osc缺乏–check-interval参数,那么检查的频率是多少呢?

    ~~~go // Migrate executes the complete migration logic. This is the major gh-ost function. func (this *Migrator) Migrate() (err error) { //在这个函数中进行load和slave lag的检查; go this.initiateThrottler() }

    //通过这个函数可以很清晰的看到,间隔是每秒钟检查一次load和slave lag;通过time.Tick(1 * time.Second)创建一个定时器,每秒钟往管道throttlerTick中写一个值,然后再通过for range从管道中读取; // initiateThrottler initiates the throttle ticker and sets the basic behavior of throttling. func (this *Migrator) initiateThrottler() error { throttlerTick := time.Tick(1 * time.Second)

    throttlerFunction := func() {
    	alreadyThrottling, currentReason := this.migrationContext.IsThrottled()
    	shouldThrottle, throttleReason := this.shouldThrottle()
    	if shouldThrottle && !alreadyThrottling {
    		// New throttling
    		this.applier.WriteAndLogChangelog("throttle", throttleReason)
    	} else if shouldThrottle && alreadyThrottling && (currentReason != throttleReason) {
    		// Change of reason
    		this.applier.WriteAndLogChangelog("throttle", throttleReason)
    	} else if alreadyThrottling && !shouldThrottle {
    		// End of throttling
    		this.applier.WriteAndLogChangelog("throttle", "done throttling")
    	}
    	this.migrationContext.SetThrottled(shouldThrottle, throttleReason)
    }
    throttlerFunction()
    for range throttlerTick {
    	throttlerFunction()
    }
    
    return nil }
    
  • 相关阅读:
    Python菜鸟之路:Python基础-线程、进程、协程
    Python菜鸟之路:Python基础-Socket编程-2
    Python菜鸟之路:Python基础-Socket基础-1
    react-native 使用 antd-mobile-rn UI进行开发app
    Java基础
    Java基础
    Java基础
    Java基础
    Java基础
    Java基础
  • 原文地址:https://www.cnblogs.com/weifeng1463/p/13627795.html
Copyright © 2011-2022 走看看