zoukankan      html  css  js  c++  java
  • 通过DataX从Oracle同步数据到MySQL-安装配置过程

    DataX

    DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。

    Features

    DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。

    System Requirements

    Linux
    JDK(1.8以上,推荐1.8)
    Python(推荐Python2.6.X)
    Apache Maven 3.X(Compile DataX)

    Quick Start

    工具部署

    方法一、直接下DataX工具包:DataX下载地址,下载后解压至本地某个目录,进入bin目录,即可运行同步作业

    $ cd  {YOUR_DATAX_HOME}/bin
    $ python datax.py {YOUR_JOB.json}

     方法二、下载DataX源码,自己编译:DataX源码

    ①.安装JDK

    tar xvf jdk-8u151-linux-x64.tar.gz
    mv jdk1.8.0_151
    vim /etc/profile.d/jdk.sh
    export JAVA_HOME=/usr/local/jdk1.8.0_151
    export JAVA_BIN=$JAVA_HOME/bin
    export PATH=$PATH:$JAVA_BIN
    export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    source /etc/profile.d/jdk.sh

     检测安装是否成功

    [root@oracle ~]# java -version
    java version "1.8.0_151"
    Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
    Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

     ②.查看Python版本,如果不满足则需要自行安装

    [root@oracle ~]# python -V
    Python 2.6.6

     ③.安装Maven

     

    开始安装配置

    tar xvf apache-maven-3.6.1-bin.tar.gz
    mv apache-maven-3.6.1-bin.tar ../maven
    vim /etc/profile
    M2_HOME=/usr/local/maven
    export PATH=${M2_HOME}/bin:/u01/mysql/bin:${PATH}

     验证Maven是否安装成功

    [root@oracle src]# mvn -v
    Apache Maven 3.6.1 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
    Maven home: /usr/local/maven
    Java version: 1.8.0_151, vendor: Oracle Corporation
    Java home: /usr/local/jdk1.8.0_151/jre
    Default locale: en_US, platform encoding: UTF-8
    OS name: "linux", version: "3.10.0-862.el7.x86_64", arch: "amd64", family: "unix"

     系统需求已配置完成,开始源码安装DataX,下载方法二选其一

    下载地址:https://github.com/alibaba/DataX 
    git clone git@github.com:alibaba/DataX.git

     开始源码安装

    unzip DataX-master.zip
    mv DataX-master ../
    cd ../DataX-master
    mvn -U clean package assembly:assembly -Dmaven.test.skip=true

     该过程非常非常的漫长,需要等待,打包成功,最终显示如下

    [INFO] BUILD SUCCESS
    [INFO] -----------------------------------------------------------------
    [INFO] Total time: 08:12 min
    [INFO] Finished at: 2015-12-13T16:26:48+08:00
    [INFO] Final Memory: 133M/960M
    [INFO] -----------------------------------------------------------------

     打包成功后的DataX包位于{DataX_source_code_home}/target/datax/datax/,结构如下:

    cd /usr/local/Datax-master
    [root@oracle DataX-master]# ls -a ./target/datax/datax/
    .  ..  bin  conf  job  lib  log  log_perf  plugin  script  tmp

     配置示例:从stream读取数据并打印到控制台,

    第一步。创建创业的配置文件(json格式),可以通过命令查看配置模板:python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

    cd /usr/local/DataX-master/target/datax/datax/bin
    ./datax.py -r streamreader -w streamwriter
    DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
    Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
    
    
    Please refer to the streamreader document:
         https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md 
    
    Please refer to the streamwriter document:
         https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md 
     
    Please save the following configuration as a json file and  use
         python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
    to run the job.
    
    {
        "job": {
            "content": [
                {
                    "reader": {
                        "name": "streamreader", 
                        "parameter": {
                            "column": [], 
                            "sliceRecordCount": ""
                        }
                    }, 
                    "writer": {
                        "name": "streamwriter", 
                        "parameter": {
                            "encoding": "", 
                            "print": true
                        }
                    }
                }
            ], 
            "setting": {
                "speed": {
                    "channel": ""
                }
            }
        }
    }

    根据模板配置自己所需的json,具体如下

    {
        "job": {
            "content": [
                {
                    "reader": {
                        "name": "oraclereader",
                        "parameter": {
                            "column": ["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],
                            "connection": [
                                {
                                   "jdbcUrl": ["jdbc:oracle:thin:@192.168.11.91:1521:orcl"],
                                    "table": ["JDZLJCXM"]
                                }
                            ],
                            "password": "jgzdwffz",
                            "username": "bjxxjgxt",
                        }
                    },
                    "writer": {
                        "name": "mysqlwriter",
                        "parameter": {
                            "column": ["zybh","xmbh","jsbh","xmmc","sfje","fylx","bz","sjc","zllx"],
                            "connection": [
                                {
                                    "jdbcUrl": "jdbc:mysql://192.168.11.75:3336/prison_practical_platform",
                                    "table": ["JDZLJCXM"]
                                }
                            ],
                            "password": "root",
                            "username": "root",
                        }
                    }
                }
            ],
            "setting": {
                "speed": {
                    "channel": "5"
                }
            }
        }
    }

     第二步。启动DataX

    cd /usr/local/DataX-master/target/datax/datax/bin
    ./datax.py ../job/oracle11mysql8.json

     同步结束,相关日志如下:

    DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
    Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
    
    
    2019-05-24 11:59:06.065 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
    2019-05-24 11:59:06.075 [main] INFO  Engine - the machine info  => 
    
        osInfo:    Oracle Corporation 1.8 25.161-b12
        jvmInfo:    Linux amd64 3.10.0-862.el7.x86_64
        cpu num:    1
    
        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:    -0.00G
        maxFileDescriptorCount:    -1
        currentOpenFileDescriptorCount:    -1
    
        GC Names    [Copy, MarkSweepCompact]
    
        MEMORY_NAME                    | allocation_size                | init_size                      
        Eden Space                     | 273.06MB                       | 273.06MB                       
        Code Cache                     | 240.00MB                       | 2.44MB                         
        Survivor Space                 | 34.13MB                        | 34.13MB                        
        Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
        Metaspace                      | -0.00MB                        | 0.00MB                         
        Tenured Gen                    | 682.69MB                       | 682.69MB                       
    
    
    2019-05-24 11:59:06.093 [main] INFO  Engine - 
    {
        "content":[
            {
                "reader":{
                    "name":"oraclereader",
                    "parameter":{
                        "column":[
                            "zybh",
                            "xmbh",
                            "jsbh",
                            "xmmc",
                            "sfje",
                            "fylx",
                            "bz",
                            "sjc",
                            "zllx"
                        ],
                        "connection":[
                            {
                                "jdbcUrl":[
                                    "jdbc:oracle:thin:@192.168.11.91:1521:orcl"
                                ],
                                "table":[
                                    "JDZLJCXM"
                                ]
                            }
                        ],
                        "password":"********",
                        "username":"bjxxjgxt"
                    }
                },
                "writer":{
                    "name":"mysqlwriter",
                    "parameter":{
                        "column":[
                            "zybh",
                            "xmbh",
                            "jsbh",
                            "xmmc",
                            "sfje",
                            "fylx",
                            "bz",
                            "sjc",
                            "zllx"
                        ],
                        "connection":[
                            {
                                "jdbcUrl":"jdbc:mysql://192.168.11.75:3336/prison_practical_platform",
                                "table":[
                                    "JDZLJCXM"
                                ]
                            }
                        ],
                        "password":"****",
                        "username":"root"
                    }
                }
            }
        ],
        "setting":{
            "speed":{
                "channel":"5"
            }
        }
    }
    
    2019-05-24 11:59:06.111 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
    2019-05-24 11:59:06.121 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
    2019-05-24 11:59:06.121 [main] INFO  JobContainer - DataX jobContainer starts job.
    2019-05-24 11:59:06.122 [main] INFO  JobContainer - Set jobId = 0
    2019-05-24 11:59:06.488 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@192.168.11.91:1521:orcl.
    2019-05-24 11:59:06.605 [job-0] INFO  OriginalConfPretreatmentUtil - table:[JDZLJCXM] has columns:[ZYBH,XMBH,JSBH,XMMC,SFJE,FYLX,BZ,SJC,ZLLX].
    2019-05-24 11:59:06.952 [job-0] INFO  OriginalConfPretreatmentUtil - table:[JDZLJCXM] all columns:[
    ZYBH,XMBH,JSBH,XMMC,SFJE,FYLX,BZ,SJC,ZLLX
    ].
    2019-05-24 09:48:44.768 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
    INSERT INTO %s (zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx) VALUES(?,?,?,?,?,?,?,?,?)
    ], which jdbcUrl like:[jdbc:mysql://192.168.11.75:3336/prison_practical_platform?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
    2019-05-24 09:48:44.768 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
    2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do prepare work .
    2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
    2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - jobContainer starts to do split ...
    2019-05-24 09:48:44.769 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
    2019-05-24 09:48:44.772 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks.
    2019-05-24 09:48:44.772 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
    2019-05-24 09:48:44.820 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
    2019-05-24 09:48:44.830 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
    2019-05-24 09:48:44.832 [job-0] INFO  JobContainer - Running by standalone Mode.
    2019-05-24 09:48:44.866 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
    2019-05-24 09:48:44.876 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
    2019-05-24 09:48:44.877 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
    2019-05-24 09:48:44.898 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
    2019-05-24 09:48:44.901 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx from JDZLJCXM 
    ] jdbcUrl:[jdbc:oracle:thin:@192.168.11.91:1521:orcl].
    2019-05-24 09:48:45.050 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select zybh,xmbh,jsbh,xmmc,sfje,fylx,bz,sjc,zllx from JDZLJCXM 
    ] jdbcUrl:[jdbc:oracle:thin:@192.168.11.91:1521:orcl].
    2019-05-24 09:48:46.419 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[1538]ms
    2019-05-24 09:48:46.420 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
    2019-05-24 09:48:54.873 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1554 records, 79079 bytes | Speed 7.72KB/s, 155 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.124s | Percentage 100.00%
    2019-05-24 09:48:54.874 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
    2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
    2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do post work.
    2019-05-24 09:48:54.874 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
    2019-05-24 09:48:54.875 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /usr/local/DataX-master/target/datax/datax/hook
    2019-05-24 09:48:54.875 [job-0] INFO  JobContainer - 
         [total cpu info] => 
            averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
            -1.00%                         | -1.00%                         | -1.00%
                            
    
         [total gc info] => 
             NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
             Copy                 | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
             MarkSweepCompact     | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
    
    2019-05-24 09:48:54.875 [job-0] INFO  JobContainer - PerfTrace not enable!
    2019-05-24 09:48:54.876 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1554 records, 79079 bytes | Speed 7.72KB/s, 155 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.020s |  All Task WaitReaderTime 0.124s | Percentage 100.00%
    2019-05-24 09:48:54.876 [job-0] INFO  JobContainer - 
    任务启动时刻                    : 2019-05-24 09:48:43
    任务结束时刻                    : 2019-05-24 09:48:54
    任务总计耗时                    :                 11s
    任务平均流量                    :            7.72KB/s
    记录写入速度                    :            155rec/s
    读出记录总数                    :                1554
    读写失败总数                    :                   0

     在配置过程中出现的问题,错误如下

    [ERROR] Failed to execute goal on project otsstreamreader: Could not resolve dependencies for project com.alibaba.datax:otsstreamreader:jar:1.0.0-SNAPSHOT: Could not find artifact 
    com.aliyun.openservices:tablestore-streamclient:jar:1.0.0-SNAPSHOT -> [Help 1]

     该错误是由于快照版本不一致,由于ots基本不会被用到,直接把pom.xml中的<module>ots</module>去掉,也可以更改版本otsstreamreader中的默认版本为0.0.1,改为1.0.0

    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-jar-plugin:2.4:jar (default-jar) on project ocswriter: Error assembling JAR: /Users/FengZhen/Desktop/Hadoop/DataX/源码/DataX/ocswriter/pom.xml isn't a file. -> [Help 1]

     将ocs注释掉,重新打包即可。

    参考来源:https://github.com/alibaba/DataX/blob/master/userGuid.md

         http://www.cnblogs.com/EnzoDin/p/9979583.html

  • 相关阅读:
    LeetCode:Length of Last Word
    Team Queue(POJ 2259)
    LeetCode:Largest Rectangle in Histogram(update)
    bzoj4821 && luogu3707 SDOI2017相关分析(线段树,数学)
    luogu1438无聊的数列(区间加等差数列,求一个数的和)
    luogu1081 开车旅行2012 D1T3 (倍增,set,O2)
    bzoj4094 && luogu3097 最优挤奶
    luogu3888 GDOI2014拯救莫里斯 (状压dp)
    noip2017D1T3逛公园(拓扑图上dp,记忆化搜索)
    bzoj1065 NOI2008奥运物流 (dp,树上背包,推式子)
  • 原文地址:https://www.cnblogs.com/Roobbin/p/10917350.html
Copyright © 2011-2022 走看看