zoukankan      html  css  js  c++  java
  • 实时电商数仓(十)之数据采集(九)数据库数据采集(四)Maxwell入门与安装

    1  Maxwell

    maxwell 是由美国zendesk开源,用java编写的Mysql实时抓取软件。 其抓取的原理也是基于binlog

    1.1  工具对比

    1 Maxwell 没有 Canal那种server+client模式,只有一个server把数据发送到消息队列或redis。

    2 Maxwell 有一个亮点功能,就是Canal只能抓取最新数据,对已存在的历史数据没有办法处理。而Maxwell有一个bootstrap功能,可以直接引导出完整的历史数据用于初始化,非常好用。

    3 Maxwell不能直接支持HA,但是它支持断点还原,即错误解决后重启继续上次点儿读取数据。

    4 Maxwell只支持json格式,而Canal如果用Server+client模式的话,可以自定义格式。

    5 MaxwellCanal更加轻量级。

    1.2  安装Maxwell

         解压缩maxwell-1.25.0.tar.gz 到某个目录下。

    1.3    使用前准备工作

    在数据库中建立一个maxwell库用于存储Maxwell的元数据。

    CREATE DATABASE maxwell ;

    并且分配一个账号可以操作该数据库

    GRANT ALL PRIVILEGES ON *.* TO 'maxwell'@'%' IDENTIFIED BY '123123';

    分配这个账号可以监控其他数据库的权限

    GRANT  SELECT ,REPLICATION SLAVE , REPLICATION CLIENT  ON *.* TO maxwell@'%'

    1.4   使用Maxwell监控抓取MySql数据

    在任意位置建立maxwell.properties 文件

    producer=kafka
    kafka.bootstrap.servers=hadoop1:9092,hadoop2:9092,hadoop3:9092
    kafka_topic=ODS_DB_GMALL2020_M
    
    host=hadoop2
    user=maxwell
    password=123123
    
    client_id=maxwell_1

    启动程序

    /opt/module/maxwell/bin/maxwell --config  /opt/module/maxwell/config.properties >/dev/null 2>&1 &

    1.5   修改或插入mysql数据,并消费kafka进行观察

    /ext/kafka_2.11-1.0.0/bin/kafka-topics.sh --create --topic ODS_DB_GMALL2020_M --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181     --partitions 12 --replication-factor 1

    执行测试语句

    INSERT INTO z_user_info VALUES(30,'zhang3','13810001010'),(31,'li4','1389999999');

    对比

    canal

    maxwell

    {"data":[{"id":"30","user_name":"zhang3","tel":"13810001010"},{"id":"31","user_name":"li4","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385314000,"id":2,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385314116,"type":"INSERT"}

    {"database":"gmall-2020-04","table":"z_user_info","type":"insert","ts":1589385314,"xid":82982,"xoffset":0,"data":{"id":30,"user_name":"zhang3","tel":"13810001010"}}

    {"database":"gmall-2020-04","table":"z_user_info","type":"insert","ts":1589385314,"xid":82982,"commit":true,"data":{"id":31,"user_name":"li4","tel":"1389999999"}}

    执行update操作

    UPDATE z_user_info SET user_name='wang55' WHERE id IN(30,31)

    canal

    maxwell

    {"data":[{"id":"30","user_name":"wang55","tel":"13810001010"},{"id":"31","user_name":"wang55","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385508000,"id":3,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":[{"user_name":"zhang3"},{"user_name":"li4"}],"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385508676,"type":"UPDATE"}

    {"database":"gmall-2020-04","table":"z_user_info","type":"update","ts":1589385508,"xid":83206,"xoffset":0,"data":{"id":30,"user_name":"wang55","tel":"13810001010"},"old":{"user_name":"zhang3"}}

    {"database":"gmall-2020-04","table":"z_user_info","type":"update","ts":1589385508,"xid":83206,"commit":true,"data":{"id":31,"user_name":"wang55","tel":"1389999999"},"old":{"user_name":"li4"}}

    delete操作

    DELETE  FROM z_user_info   WHERE id IN(30,31)

    canal

    maxwell

    {"data":[{"id":"30","user_name":"wang55","tel":"13810001010"},{"id":"31","user_name":"wang55","tel":"1389999999"}],"database":"gmall-2020-04","es":1589385644000,"id":4,"isDdl":false,"mysqlType":{"id":"bigint(20)","user_name":"varchar(20)","tel":"varchar(20)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"user_name":12,"tel":12},"table":"z_user_info","ts":1589385644829,"type":"DELETE"}

    {"database":"gmall-2020-04","table":"z_user_info","type":"delete","ts":1589385644,"xid":83367,"xoffset":0,"data":{"id":30,"user_name":"wang55","tel":"13810001010"}}

    {"database":"gmall-2020-04","table":"z_user_info","type":"delete","ts":1589385644,"xid":83367,"commit":true,"data":{"id":31,"user_name":"wang55","tel":"1389999999"}}

    总结数据特点:

    日志结构

    canal 每一条SQL会产生一条日志,如果该条Sql影响了多行数据,则已经会通过集合的方式归集在这条日志中。(即使是一条数据也会是数组结构)

    maxwell 以影响的数据为单位产生日志,即每影响一条数据就会产生一条日志。如果想知道这些日志是否是通过某一条sql产生的可以通过xid进行判断,相同的xid的日志来自同一sql。

    数字类型

       当原始数据是数字类型时,maxwell会尊重原始数据的类型不增加双引,变为字符串。

       canal一律转换为字符串。

    带原始数据字段定义

    canal数据中会带入表结构。maxwell更简洁。

    本文来自博客园,作者:秋华,转载请注明原文链接:https://www.cnblogs.com/qiu-hua/p/13658815.html

  • 相关阅读:
    《Algorithms 4th Edition》读书笔记——2.4 优先队列(priority queue)-Ⅶ(延伸:堆排序的实现)
    《Algorithms 4th Edition》读书笔记——2.4 优先队列(priority queue)-Ⅵ
    《Algorithms 4th Edition》读书笔记——2.4 优先队列(priority queue)-Ⅴ
    Uva227.Puzzle
    UVa1587.Digit Counting
    《Two Days DIV + CSS》读书笔记——CSS选择器
    《Two Days DIV + CSS》读书笔记——CSS控制页面方式
    《Algorithms 4th Edition》读书笔记——2.4 优先队列(priority queue)-Ⅳ
    《Algorithms 4th Edition》读书笔记——2.4 优先队列(priority queue)-Ⅲ
    校赛总结
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/13658815.html
Copyright © 2011-2022 走看看