zoukankan      html  css  js  c++  java
  • ClickHouse 库引擎

    ClickHouse 的建库语句如下:

    CREATE DATABASE IF NOT EXISTS db_name [ON CLUSTER cluster[ENGINE = engine]

    数据库引擎主要分为 5 种:

    Ordinary:默认引擎,使用时无需在建库时刻意声明,在此数据库下的表可以使用任意的类型的表引擎

    Dictionary:字典引擎,此类数据库会自动为所有数据字典创建它们的数据表(加载配置文件中配置的字段表信息和数据)

    Memory:内存引擎,用户存放临时数据,数据只会在内存中,不会涉及任何磁盘操作,当服务重启后数据会清空。

    MySQL: MySQL 引擎,会自动拉取远端 MySQL 中的数据,并在该库下创建 MySQL 表引擎的数据表。

    Lazy:日志引起,在该数据库下只能创建 log 系列引擎的表

     

    MYSQL引擎:
    CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] ENGINE = MySQL('host:port', ['database' | database], 'user', 'password')

    MySQL数据库引擎参数

    host:port — 链接的MySQL地址。

    database — 链接的MySQL数据库。

    user — 链接的MySQL用户。

    password — 链接的MySQL用户密码。

    mysql引擎应该:

    -- 1 在ClickHouse中创建MySQL类型的数据库,同时与MySQL服务器交换数据:
    cdh3 :) CREATE DATABASE IF NOT EXISTS flink_test
    :-] ENGINE = MySQL('cdh1:3306', 'flink_test', 'scm', 'scm');
    CREATE DATABASE IF NOT EXISTS flink_test
    ENGINE = MySQL('cdh1:3306', 'flink_test', 'scm', 'scm')
    Ok.
    0 rows in set. Elapsed: 0.004 sec.
    
    -- 2 查看库
    cdh3 :) SHOW DATABASES;
    SHOW DATABASES
    → Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) 
    ┌─name───────┐
    │ default    │
    │ flink_test │
    │ system     │
    └────────────┘
    ↘ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ↓ Progress: 3.00 rows, 290.00 B (528.24 rows/s., 51.06 KB/s.)
    3 rows in set. Elapsed: 0.006 sec.
    
    -- 3 查看 flink_test 库中的表。此时在ClickHouse中便可以看到MySQL中的表。其它未用到的表已省略
    cdh3 :) SHOW TABLES FROM flink_test;
    SHOW TABLES FROM flink_test
    ↙ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) 
    ┌─name───────────────────┐
    │ vote_recordss_memory   │
    │ w3                     │
    └────────────────────────┘
    ← Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ↖ Progress: 17.00 rows, 661.00 B (1.95 thousand rows/s., 75.99 KB/s.)
    17 rows in set. Elapsed: 0.009 sec.
    
    -- 4 选择库
    cdh3 :) USE flink_test;
    USE flink_test
    Ok.
    0 rows in set. Elapsed: 0.005 sec.
    
    -- 5 插入数据(表名区分大小写)
    cdh3 :) INSERT INTO w3 VALUES(3, 'Mercury');
    INSERT INTO w3 VALUES
    Ok.
    1 rows in set. Elapsed: 0.022 sec.
    
    -- 6 查询数据。数据插入后不支持删除和更新。
    cdh3 :) SELECT * FROM w3;
    SELECT *
    FROM w3
    ← Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) 
    ┌─id─┬─f1──────┐
    │  3 │ Mercury │
    │  5 │ success │
    └────┴─────────┘
    ↖ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ↑ Progress: 2.00 rows, 42.00 B (202.58 rows/s., 4.25 KB/s.)
    2 rows in set. Elapsed: 0.010 sec.
    
    -- 7 查看 MySQL 和 ClickHouse 对数据的聚合能力
    --  7.1 MySQL。可以看到在MySQL中统计一张将近2千万数据量的表花费了 29.54 秒
    mysql> SELECT COUNT(*) FROM vote_recordss_memory;
    +----------+
    | COUNT(*) |
    +----------+
    | 19999998 |
    +----------+
    1 row in set (29.54 sec)
    
    --  7.2 ClickHouse 中执行一次COUNT,花费了 9.713 秒
    cdh3 :) SELECT COUNT(*) FROM vote_recordss_memory;
    SELECT COUNT(*)
    FROM vote_recordss_memory
    ↘ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ↓ Progress: 131.07 thousand rows, 131.07 KB (1.14 million rows/s., 1.14 MB/s.) ↙ Progress: 327.68 thousand rows, 327.68 KB (1.52 million rows/s., 1.52 MB/s.) ← Progress: 524.29 thousand rows, 524.29 KB (1.66 millio 
    ┌──COUNT()─┐
    │ 19999998 │
    └──────────┘
    ↓ Progress: 19.79 million rows, 19.79 MB (2.04 million rows/s., 2.04 MB/s.) ↙ Progress: 20.00 million rows, 20.00 MB (2.06 million rows/s., 2.06 MB/s.)
    1 rows in set. Elapsed: 9.713 sec. Processed 20.00 million rows, 20.00 MB (2.06 million rows/s., 2.06 MB/s.)
    
    -- 7.3 在查询时指定mysql的连接、库名、表名、登录信息,等价于上面的SQL。
    cdh3 :) SELECT COUNT(*) FROM  mysql('cdh1:3306', 'flink_test', 'vote_recordss_memory', 'root', '123456');
    
    -- 8 使用 ClickHouse 的 MergeTree 表引擎
    --  8.1 切换到 ClickHouse 默认库下
    cdh1 :) USE default;
    USE default
    Ok.
    0 rows in set. Elapsed: 0.007 sec.
    
    --  8.2 创建表并指定 MergeTree 表引擎,将MySQL数据加载进来,同时指定排序规则主键值为准
    cdh1 :) CREATE TABLE vote_recordss
    :-] ENGINE = MergeTree--(id, create_time)
    :-] ORDER BY id AS
    :-] SELECT * FROM mysql('cdh1:3306', 'flink_test', 'vote_recordss_memory', 'root', '123456');
    CREATE TABLE vote_recordss
    ENGINE = MergeTree
    ORDER BY id AS
    SELECT *
    FROM mysql('cdh1:3306', 'flink_test', 'vote_recordss_memory', 'root', '123456')
    ↖ Progress: 65.54 thousand rows, 3.01 MB (299.97 thousand rows/s., 13.80 MB/s.) ↑ Progress: 131.07 thousand rows, 6.03 MB (411.12 thousand rows/s., 18.91 MB/s.) ↗ Progress: 196.61 thousand rows, 9.04 MB (468.88 thousand rows/s., 21.57 MB/s.) → Progress: 262.14 thousand  Ok.
    0 rows in set. Elapsed: 27.917 sec. Processed 20.00 million rows, 920.00 MB (716.40 thousand rows/s., 32.95 MB/s.)
    
    --  8.3 查询。可以看到是count某个值的速度速度约为MySQL的2950倍
    cdh1 :) SELECT COUNT(*) FROM vote_recordss;
    SELECT COUNT(*)
    FROM vote_recordss
    ↙ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) 
    ┌──COUNT()─┐
    │ 19999998 │
    └──────────┘
    ← Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ↖ Progress: 20.00 million rows, 80.00 MB (2.26 billion rows/s., 9.06 GB/s.)  98%
    1 rows in set. Elapsed: 0.009 sec. Processed 20.00 million rows, 80.00 MB (2.20 billion rows/s., 8.79 GB/s.)
    
    --  8.4 去重。可以看到ClickHouse速度约为MySQL的94倍
    mysql> SELECT DISTINCT group_id from vote_recordss_memory ;
    +----------+
    | group_id |
    +----------+
    |        1 |
    |        2 |
    |        0 |
    +----------+
    3 rows in set (12.79 sec)
    --  ClickHouse中执行
    cdh1 :) SELECT DISTINCT group_id from vote_recordss;
    SELECT DISTINCT group_id
    FROM vote_recordss
    ↑ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) 
    ┌─group_id─┐
    │        0 │
    │        2 │
    │        1 │
    └──────────┘
    ↗ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) → Progress: 19.04 million rows, 76.17 MB (145.18 million rows/s., 580.70 MB/s.)  94%↘ Progress: 20.00 million rows, 80.00 MB (147.97 million rows/s., 591.87 MB/s.)  98%
    3 rows in set. Elapsed: 0.136 sec. Processed 20.00 million rows, 80.00 MB (147.44 million rows/s., 589.76 MB/s.)
    
    --  8.5 分组统计。可以看到ClickHouse速度约为MySQL的94倍
    mysql> SELECT SUM(vote_num),group_id from vote_recordss_memory GROUP BY group_id;
    +---------------+----------+
    | SUM(vote_num) | group_id |
    +---------------+----------+
    |   33344339689 |        0 |
    |   33315889226 |        1 |
    |   33351509121 |        2 |
    +---------------+----------+
    3 rows in set (16.26 sec)
    --  ClickHouse中执行
    cdh1 :)  SELECT SUM(vote_num),group_id from vote_recordss GROUP BY group_id;
    SELECT
        SUM(vote_num),
        group_id
    FROM vote_recordss
    GROUP BY group_id
    ↙ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) ← Progress: 11.43 million rows, 91.42 MB (101.40 million rows/s., 811.20 MB/s.)  56%
    ┌─SUM(vote_num)─┬─group_id─┐
    │   333443396890 │
    │   333515091212 │
    │   333158892261 │
    └───────────────┴──────────┘
    ↖ Progress: 11.43 million rows, 91.42 MB (66.08 million rows/s., 528.64 MB/s.)  56%↑ Progress: 20.00 million rows, 160.00 MB (115.61 million rows/s., 924.84 MB/s.)  98%
    3 rows in set. Elapsed: 0.173 sec. Processed 20.00 million rows, 160.00 MB (115.56 million rows/s., 924.45 MB/s.)
    
    --  8.6 排序取TOP 10。可以看到ClickHouse速度约为MySQL的25倍
    mysql> SELECT * FROM vote_recordss_memory ORDER BY create_time DESC,vote_num LIMIT 10;
    +----------+----------------------+----------+----------+--------+---------------------+
    | id       | user_id              | vote_num | group_id | status | create_time         |
    +----------+----------------------+----------+----------+--------+---------------------+
    | 19999993 | 4u6PJYvsDD4khghreFvm |     2388 |        0 |      1 | 2019-10-15 01:00:20 |
    | 19999998 | shTrosZpT5zux3wiKH5a |     4991 |        2 |      1 | 2019-10-15 01:00:20 |
    | 19999995 | xRwQuMgQeuBoXvsBusFO |     6737 |        2 |      1 | 2019-10-15 01:00:20 |
    | 19999996 | 5QNgMYoQUSsuX7Aqarw8 |     7490 |        2 |      2 | 2019-10-15 01:00:20 |
    | 19999997 | eY12Wq9iSm0MH1PUTChk |     7953 |        0 |      2 | 2019-10-15 01:00:20 |
    | 19999994 | ZpS0dWRm1TdhzTxTHCSj |     9714 |        0 |      1 | 2019-10-15 01:00:20 |
    | 19999946 | kf7FOTUHAICP5Mv2xodI |       32 |        2 |      2 | 2019-10-15 01:00:19 |
    | 19999738 | ER90qVc4CJCKH5bxXYTo |       57 |        1 |      2 | 2019-10-15 01:00:19 |
    | 19999810 | gJHbBkGf0bJViwy5BB2d |      190 |        1 |      2 | 2019-10-15 01:00:19 |
    | 19999977 | Wq7bogXRiHubhFlAHBJH |      208 |        0 |      2 | 2019-10-15 01:00:19 |
    +----------+----------------------+----------+----------+--------+---------------------+
    10 rows in set (15.31 sec)
    --  ClickHouse中执行
    cdh1 :)  SELECT * FROM vote_recordss ORDER BY create_time DESC,vote_num LIMIT 10;
    SELECT *
    FROM vote_recordss
    ORDER BY
        create_time DESC,
        vote_num ASC
    LIMIT 10
    ↗ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) → Progress: 2.34 million rows, 107.77 MB (21.21 million rows/s., 975.60 MB/s.)  11%↘ Progress: 5.31 million rows, 244.19 MB (24.97 million rows/s., 1.15 GB/s.)  26%↓ Progress: 8.75 million rows, 402.46 MB (27.97 mi%
    ┌───────id─┬─user_id──────────────┬─vote_num─┬─group_id─┬─status─┬─────────create_time─┐
    │ 19999993 │ 4u6PJYvsDD4khghreFvm │     2388012019-10-15 01:00:20 │
    │ 19999998 │ shTrosZpT5zux3wiKH5a │     4991212019-10-15 01:00:20 │
    │ 19999995 │ xRwQuMgQeuBoXvsBusFO │     6737212019-10-15 01:00:20 │
    │ 19999996 │ 5QNgMYoQUSsuX7Aqarw8 │     7490222019-10-15 01:00:20 │
    │ 19999997 │ eY12Wq9iSm0MH1PUTChk │     7953022019-10-15 01:00:20 │
    │ 19999994 │ ZpS0dWRm1TdhzTxTHCSj │     9714012019-10-15 01:00:20 │
    │ 19999946 │ kf7FOTUHAICP5Mv2xodI │       32222019-10-15 01:00:19 │
    │ 19999738 │ ER90qVc4CJCKH5bxXYTo │       57122019-10-15 01:00:19 │
    │ 19999810 │ gJHbBkGf0bJViwy5BB2d │      190122019-10-15 01:00:19 │
    │ 19999977 │ Wq7bogXRiHubhFlAHBJH │      208022019-10-15 01:00:19 │
    └──────────┴──────────────────────┴──────────┴──────────┴────────┴─────────────────────┘
    ↖ Progress: 16.65 million rows, 766.10 MB (27.46 million rows/s., 1.26 GB/s.)  82%↑ Progress: 20.00 million rows, 920.00 MB (32.98 million rows/s., 1.52 GB/s.)  98%
    10 rows in set. Elapsed: 0.607 sec. Processed 20.00 million rows, 920.00 MB (32.93 million rows/s., 1.51 GB/s.)
  • 相关阅读:
    1.数据结构《Pytorch神经网络高效入门教程》Deeplizard
    plt.figure()的使用,plt.plot(),plt.subplot(),plt.subplots()和图中图
    可变长参数
    np.linspace,numpy中的linspace()
    python和numpy中sum()函数的异同
    maven install 错误
    spring boot启动后执行方法
    java 定时任务多线程处理
    java 生成txt文件
    java 方法超时
  • 原文地址:https://www.cnblogs.com/uestc2007/p/13937088.html
Copyright © 2011-2022 走看看