DataX工具特点:离线,异构
具体的介绍可以拜读 https://github.com/alibaba/DataX
1、DataX工具下载
http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
2、DataX依赖和安装
参考:https://github.com/alibaba/DataX/blob/master/userGuid.md
3、测试描述
本测试源端有4个分片实例(这里使用一个实例4个库模拟),每个分片2个库,每个库2张表,
同步到目表端oceanbase MySQL租户(test_tenant_1)下的t_sum表中,源表主键id,不具有业务实际意义。
4、目标端OB环境test_tenant_1租户db1库下创建汇总表
create table t_sum(id int not null auto_increment primary key,name varchar(20)); MySQL [db1]> show tenant; +---------------------+ | Current_tenant_name | +---------------------+ | test_tenant_1 | +---------------------+ 1 row in set (0.003 sec) MySQL [db1]> show create table t_sum G *************************** 1. row *************************** Table: t_sum Create Table: CREATE TABLE `t_sum` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(20) DEFAULT NULL, PRIMARY KEY (`id`) ) AUTO_INCREMENT = 1 DEFAULT CHARSET = utf8mb4 ROW_FORMAT = COMPACT COMPRESSION = 'zstd_1.3.8'
REPLICA_NUM = 3 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 1 row in set (0.004 sec)
5、源端MySQL分片实例库表
#MySQL模拟分片1 create table db1.t1(id int not null auto_increment primary key,name varchar(20)); create table db1.t2(id int not null auto_increment primary key,name varchar(20)); insert into db1.t1 values(1,'a'); insert into db1.t1 values(2,'b'); insert into db1.t2 values(1,'c'); insert into db1.t2 values(2,'d'); #MySQL模拟分片2 create table db2.t3(id int not null auto_increment primary key,name varchar(20)); create table db2.t4(id int not null auto_increment primary key,name varchar(20)); insert into db2.t3 values(1,'e'); insert into db2.t3 values(2,'f'); insert into db2.t4 values(1,'g'); insert into db2.t4 values(2,'h'); #MySQL模拟分片3 create table db3.t5(id int not null auto_increment primary key,name varchar(20)); create table db3.t6(id int not null auto_increment primary key,name varchar(20)); insert into db3.t5 values(1,'i'); insert into db3.t5 values(2,'j'); insert into db3.t6 values(1,'k'); insert into db3.t6 values(2,'l'); #MySQL模拟分片4 create table db4.t7(id int not null auto_increment primary key,name varchar(20)); create table db4.t8(id int not null auto_increment primary key,name varchar(20)); insert into db4.t7 values(1,'m'); insert into db4.t7 values(2,'n'); insert into db4.t8 values(1,'o'); insert into db4.t8 values(2,'p');
6、配置模板获取
由于OceanBase兼容MySQL协议,所以这里writer选择mysqlwriter
[root@tidb60 bin]# python datax.py -r mysqlreader -w mysqlwriter DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. Please refer to the mysqlreader document: https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md Please refer to the mysqlwriter document: https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md Please save the following configuration as a json file and use python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json to run the job. { "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": [], "connection": [ { "jdbcUrl": [], "table": [] } ], "password": "", "username": "", "where": "" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": [], "connection": [ { "jdbcUrl": "", "table": [] } ], "password": "", "preSql": [], "session": [], "username": "", "writeMode": "" } } } ], "setting": { "speed": { "channel": "" } } } }
7、编辑配置文件
这里的表和库不支持通配符不清楚是自己姿势不对还是原本就不支持。
com.alibaba.datax.common.exception.DataXException: Code:[DBUtilErrorCode-07], Description:[读取数据库数据失败. 请检查您的配置的
column/table/where/querySql或者向 DBA 寻求帮助.]. - 执行的SQL为: select * from ^t.* where 1=2 具体错误信息为:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds
to your MySQL server version for the right syntax to use near '^t.* where 1=2' at line 1
{ "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "root654321", "column": ["name"], "splitPk": "id", "connection": [ { "table": [ "t1","t2" ], "jdbcUrl": [ "jdbc:mysql://192.168.1.100:6008/db1" ] }, { "table": [ "t3","t4" ], "jdbcUrl": [ "jdbc:mysql://192.168.1.100:6008/db2" ] }, { "table": [ "t5","t6" ], "jdbcUrl": [ "jdbc:mysql://192.168.1.100:6008/db3" ] }, { "table": [ "t7","t8" ], "jdbcUrl": [ "jdbc:mysql://192.168.1.100:6008/db4" ] } ] } }, "writer": { "name": "mysqlwriter", "parameter": { "writeMode": "insert", "username": "root@test_tenant_1#myob_test", "password": "root", "preSql": ["truncate table t_sum;"], "column": ["name"], "connection": [ { "jdbcUrl": "jdbc:mysql://192.168.1.100:12881/db1", "table": [ "t_sum", ] } ] } } } ] } }
8、执行同步任务
如果数据量大同步时间长,建议开启screen -S mydatax避免终端断开。
[root@test bin]# python datax.py mytest.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. 2021-08-18 09:41:49.744 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl 2021-08-18 09:41:49.753 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.77-b03 jvmInfo: Linux amd64 3.10.0-862.el7.x86_64 cpu num: 32 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space | 256.00MB | 256.00MB Code Cache | 240.00MB | 2.44MB Compressed Class Space | 1,024.00MB | 0.00MB PS Survivor Space | 42.50MB | 42.50MB PS Old Gen | 683.00MB | 683.00MB Metaspace | -0.00MB | 0.00MB 2021-08-18 09:41:49.776 [main] INFO Engine - { "content":[ { "reader":{ "name":"mysqlreader", "parameter":{ "column":[ "name" ], "connection":[ { "jdbcUrl":[ "jdbc:mysql://192.168.1.100:6008/db1" ], "table":[ "t1", "t2" ] }, { "jdbcUrl":[ "jdbc:mysql://192.168.1.100:6008/db2" ], "table":[ "t3", "t4" ] }, { "jdbcUrl":[ "jdbc:mysql://192.168.1.100:6008/db3" ], "table":[ "t5", "t6" ] }, { "jdbcUrl":[ "jdbc:mysql://192.168.1.100:6008/db4" ], "table":[ "t7", "t8" ] } ], "password":"**********", "splitPk":"id", "username":"root" } }, "writer":{ "name":"mysqlwriter", "parameter":{ "column":[ "name" ], "connection":[ { "jdbcUrl":"jdbc:mysql://192.168.1.100:12881/db1", "table":[ "t_sum" ] } ], "password":"****", "preSql":[ "truncate table t_sum;" ], "username":"root@test_tenant_1#myob_test", "writeMode":"insert" } } } ], "setting":{ "speed":{ "channel":1 } } } 2021-08-18 09:41:49.795 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null 2021-08-18 09:41:49.797 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0 2021-08-18 09:41:49.798 [main] INFO JobContainer - DataX jobContainer starts job. 2021-08-18 09:41:49.800 [main] INFO JobContainer - Set jobId = 0 2021-08-18 09:41:50.154 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://192.168.1.100:6008/db1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true. 2021-08-18 09:41:50.166 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://192.168.1.100:6008/db2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true. 2021-08-18 09:41:50.174 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://192.168.1.100:6008/db3?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true. 2021-08-18 09:41:50.181 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://192.168.1.100:6008/db4?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true. 2021-08-18 09:41:50.194 [job-0] INFO OriginalConfPretreatmentUtil - table:[t1] has columns:[id,name]. 2021-08-18 09:41:50.482 [job-0] INFO OriginalConfPretreatmentUtil - table:[t_sum] all columns:[ id,name ]. 2021-08-18 09:41:50.497 [job-0] INFO OriginalConfPretreatmentUtil - Write data [ insert INTO %s (name) VALUES(?) ], which jdbcUrl like:[jdbc:mysql://192.168.1.100:12881/db1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true] 2021-08-18 09:41:50.498 [job-0] INFO JobContainer - jobContainer starts to do prepare ... 2021-08-18 09:41:50.498 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work . 2021-08-18 09:41:50.499 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work . 2021-08-18 09:41:50.513 [job-0] INFO CommonRdbmsWriter$Job - Begin to execute preSqls:[truncate table t_sum;]. context info:jdbc:mysql://192.168.1.100:12881/db1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true. 2021-08-18 09:41:50.619 [job-0] INFO JobContainer - jobContainer starts to do split ... 2021-08-18 09:41:50.619 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels. 2021-08-18 09:41:50.630 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [8] tasks. 2021-08-18 09:41:50.632 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [8] tasks. 2021-08-18 09:41:50.661 [job-0] INFO JobContainer - jobContainer starts to do schedule ... 2021-08-18 09:41:50.670 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2021-08-18 09:41:50.673 [job-0] INFO JobContainer - Running by standalone Mode. 2021-08-18 09:41:50.685 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [8] tasks. 2021-08-18 09:41:50.689 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2021-08-18 09:41:50.689 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2021-08-18 09:41:50.698 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started 2021-08-18 09:41:50.702 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select name from t1 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:50.721 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select name from t1 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:50.799 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[102]ms 2021-08-18 09:41:50.801 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[1] attemptCount[1] is started 2021-08-18 09:41:50.802 [0-0-1-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select name from t2 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:50.812 [0-0-1-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select name from t2 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:50.901 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[100]ms 2021-08-18 09:41:50.904 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] is started 2021-08-18 09:41:50.904 [0-0-2-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select name from t3 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:50.916 [0-0-2-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select name from t3 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.004 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[100]ms 2021-08-18 09:41:51.006 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[3] attemptCount[1] is started 2021-08-18 09:41:51.007 [0-0-3-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select name from t4 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.018 [0-0-3-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select name from t4 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.107 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[3] is successed, used[101]ms 2021-08-18 09:41:51.109 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[4] attemptCount[1] is started 2021-08-18 09:41:51.110 [0-0-4-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select name from t5 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db3?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.120 [0-0-4-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select name from t5 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db3?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.210 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[4] is successed, used[101]ms 2021-08-18 09:41:51.212 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[5] attemptCount[1] is started 2021-08-18 09:41:51.212 [0-0-5-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select name from t6 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db3?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.223 [0-0-5-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select name from t6 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db3?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.312 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[5] is successed, used[101]ms 2021-08-18 09:41:51.315 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[6] attemptCount[1] is started 2021-08-18 09:41:51.316 [0-0-6-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select name from t7 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db4?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.327 [0-0-6-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select name from t7 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db4?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.415 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[6] is successed, used[100]ms 2021-08-18 09:41:51.418 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[7] attemptCount[1] is started 2021-08-18 09:41:51.419 [0-0-7-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select name from t8 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db4?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.431 [0-0-7-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select name from t8 ] jdbcUrl:[jdbc:mysql://192.168.1.100:6008/db4?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2021-08-18 09:41:51.519 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[7] is successed, used[101]ms 2021-08-18 09:41:51.520 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks. 2021-08-18 09:42:00.698 [job-0] INFO StandAloneJobContainerCommunicator - Total 16 records, 16 bytes | Speed 1B/s, 1 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2021-08-18 09:42:00.699 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2021-08-18 09:42:00.699 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work. 2021-08-18 09:42:00.700 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work. 2021-08-18 09:42:00.700 [job-0] INFO JobContainer - DataX jobId [0] completed successfully. 2021-08-18 09:42:00.701 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /opt/datax/hook 2021-08-18 09:42:00.703 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s 2021-08-18 09:42:00.704 [job-0] INFO JobContainer - PerfTrace not enable! 2021-08-18 09:42:00.704 [job-0] INFO StandAloneJobContainerCommunicator - Total 16 records, 16 bytes | Speed 1B/s, 1 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2021-08-18 09:42:00.705 [job-0] INFO JobContainer - 任务启动时刻 : 2021-08-18 09:41:49 任务结束时刻 : 2021-08-18 09:42:00 任务总计耗时 : 10s 任务平均流量 : 1B/s 记录写入速度 : 1rec/s 读出记录总数 : 16 读写失败总数 : 0
9、在目标端确认
MySQL [db1]> show tenant; +---------------------+ | Current_tenant_name | +---------------------+ | test_tenant_1 | +---------------------+ 1 row in set (0.003 sec) MySQL [db1]> select * from t_sum; +----+------+ | id | name | +----+------+ | 1 | a | | 2 | b | | 3 | c | | 4 | d | | 5 | e | | 6 | f | | 7 | g | | 8 | h | | 9 | i | | 10 | j | | 11 | k | | 12 | l | | 13 | m | | 14 | n | | 15 | o | | 16 | p | +----+------+ 16 rows in set (0.003 sec)