zoukankan      html  css  js  c++  java
  • datax入门

    datax简单入门

    概述

    什么是datax

    DataX 是阿里巴巴开源的一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。

    image.png

    DataX的设计

    为了解决异构数据源同步问题,DataX将复杂的网状的同步链路变成了星型数据链路,DataX作为中间传输载体负责连接各种数据源。

    当需要接入一个新的数据源的时候,只需要将此数据源对接到DataX,便能跟已有的数据源做到无缝数据同步。

    image.png

    框架设计

    [image.png

    运行原理

    image.png

    快速入门

    官方地址

    下载地址:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

    源码地址:https://github.com/alibaba/DataX

    前置要求

    • Linux
    • JDK(1.8以上,推荐1.8)
    • Python(推荐Python2.6.X)

    安装

    1)将下载好的datax.tar.gz上传到other的/opt/softwarez

    [root@other software]$ ls datax.tar.gz
    

    2)解压datax.tar.gz到/opt/module

    [root@other software]$ tar -zxvf datax.tar.gz -C /opt/module/
    

    3)运行自检脚本

    [root@other ~]# cd /opt/module/datax/bin/
    [root@other bin]# ll
    total 40
    -rwxr-xr-x 1 62265 users  8993 Nov 24  2017 datax.py
    -rwxr-xr-x 1 62265 users  6906 Nov 24  2017 dxprof.py
    -rwxr-xr-x 1 62265 users 16897 Nov 24  2017 perftrace.py
    [root@other bin]# python datax.py /opt/module/datax/job/job.json
    

    image-20200908233051258

    使用案例

    从stream流读取数据并打印到控制台

    1)查看配置模板

    [root@other bin]# python datax.py -r streamreader -w streamwriter
    
    DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
    Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
    
    
    Please refer to the streamreader document:
         https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md
    
    Please refer to the streamwriter document:
         https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md
    
    Please save the following configuration as a json file and  use
         python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json
    to run the job.
    
    {
        "job": {
            "content": [
                {
                    "reader": {
                        "name": "streamreader",
                        "parameter": {
                            "column": [],
                            "sliceRecordCount": ""
                        }
                    },
                    "writer": {
                        "name": "streamwriter",
                        "parameter": {
                            "encoding": "",
                            "print": true
                        }
                    }
                }
            ],
            "setting": {
                "speed": {
                    "channel": ""
                }
            }
        }
    }
    [root@other bin]#
    
    

    2)根据模板编写配置文件

    [root@other job]# cat stream2stream.json
    {
      "job": {
        "content": [
          {
            "reader": {
              "name": "streamreader",
              "parameter": {
                "sliceRecordCount": 10,
                "column": [
                  {
                    "type": "long",
                    "value": "10"
                  },
                  {
                    "type": "string",
                    "value": "hello,DataX"
                  }
                ]
              }
            },
            "writer": {
              "name": "streamwriter",
              "parameter": {
                "encoding": "UTF-8",
                "print": true
              }
            }
          }
        ],
        "setting": {
          "speed": {
            "channel": 1
           }
        }
      }
    }
    [root@other job]#
    
    

    3)运行

    [root@other job]$ /opt/module/datax/bin/datax.py /opt/module/datax/job/stream2stream.json
    

    image-20200908233724873

    Oracle数据库

    我这里是直接用docker安装的,需要的话可以查看我之前的博客:

    新建用户

    image-20200908175022813

    建议插入数据:

    SQL>create TABLE student(id INTEGER,name VARCHAR2(20));
    SQL>insert into student values (1,'zhangsan');
    SQL> select * from student; 
            ID 	NAME
    ---------- ----------------------------------------
             1 	zhangsan
    

    Oracle与MySQL的SQL区别

    类型 Oracle MySQL
    整型 number(N)/integer int/integer
    浮点型 float float/double
    字符串类型 varchar2(N) varchar(N)
    NULL '' null和''不一样
    分页 rownum limit
    "" 限制很多,一般不让用 与单引号一样
    价格 闭源,收费 开源,免费
    主键自动增长 ×
    if not exists ×
    auto_increment ×
    create database ×
    select * from table as t ×

    DataX案例

    从Oracle中读取数据存到MySQL

    1)MySQL中创建表

    mysql> create database oracle;
    mysql> use oracle;
    mysql> create table student(id int,name varchar(20));
    

    2)编写datax配置文件

    [root@other job]# cat oralce2mysql.json
    {
        "job": {
            "content": [
                {
                    "reader": {
                        "name": "oraclereader",
                        "parameter": {
                            "column": ["*"],
                            "connection": [
                                {
                                    "jdbcUrl": ["jdbc:oracle:thin:@192.168.1.121:1521:helowin"],
                                    "table": ["student"]
                                }
                            ],
                            "password": "123456",
                            "username": "dalianpai"
                        }
                    },
                    "writer": {
                        "name": "mysqlwriter",
                        "parameter": {
                            "column": ["*"],
                            "connection": [
                                {
                                    "jdbcUrl": "jdbc:mysql://192.168.1.121:3306/datax",
                                    "table": ["student"]
                                }
                            ],
                            "password": "root",
                            "username": "root",
                            "writeMode": "insert"
                        }
                    }
                }
            ],
            "setting": {
                "speed": {
                    "channel": "1"
                }
            }
        }
    }
    [root@other job]#
    
    

    3)执行命令

    /opt/module/datax/bin/datax.py /opt/module/datax/job/oracle2mysql.json
    

    显示:

    image-20200908225726607

    结果:

    image-20200908234316845

    注:简单的演示一下,由于我的HDFS安装在CDH中,懒的开那么多虚拟机,后面有时间在继续研究一下,datax-web好像更加友好,还提供了相关的界面。

  • 相关阅读:
    AngularJS(三)——指令实战及自定义指令
    AngularJS(二)——常见指令以及下拉框实现
    AngularJS(一)理论篇
    【leetcode】8 integer to roman
    【leetcode】7 Roman to Integer
    【leetcode】6 Palindrome Number
    【leetcode】5 atoi
    【leetcode】4 Reverse Ingeger
    【leetcode】3 minstack
    【leetcode】2 数组元素右移
  • 原文地址:https://www.cnblogs.com/dalianpai/p/13636443.html
Copyright © 2011-2022 走看看