zoukankan      html  css  js  c++  java
  • clickhouse两分片两副本集群部署

    个人学习笔记,谢绝转载!!!
    原文:https://www.cnblogs.com/wshenjin/p/13143929.html


    zookeeper+ReplicatedMergeTree(复制表)+Distributed(分布式表)

    节点IP

    • 192.168.31.101
    • 192.168.31.102

    因为手头没有足够多的机器,所以只能用两台机器各起两个实例组成两分片两副本的集群。

    基本规划

    副本01 副本02
    分片01 192.168.31.101:9100 192.168.31.102:9200
    分片02 192.168.31.102:9100 192.168.31.101:9200

    ZK集群部署

    忽略

    配置:

    各个实例config0*.xml的差异化配置:

        <log>/var/log/clickhouse-server/clickhouse-server0*.log</log>
        <errorlog>/var/log/clickhouse-server/clickhouse-server0*.err.log</errorlog>
        <http_port>8*23</http_port>
        <tcp_port>9*00</tcp_port>
        <mysql_port>9*04</mysql_port>
        <interserver_http_port>9*09</interserver_http_port>
        <path>/data/database/clickhouse0*/</path>
        <tmp_path>/data/database/clickhouse0*/tmp/</tmp_path>
        <user_files_path>/data/database/clickhouse0*/user_files/</user_files_path>
        <format_schema_path>/data/database/clickhouse0*/format_schemas/</format_schema_path>
        <include_from>/etc/clickhouse-server/metrika0*.xml</include_from>
    

    各个实例metrika0*.xml中相同的配置:

    <!--集群相关配置-->
        <clickhouse_remote_servers>
            <!--自定义集群名称 ckcluster_2shards_2replicas-->
            <ckcluster_2shards_2replicas>
                <!--分片1-->
                <shard>
                    <internal_replication>true</internal_replication>
                    <!--副本1-->
                    <replica>
                        <host>192.168.31.101</host>
                        <port>9100</port>
                    </replica>
                    <!--副本2-->
                    <replica>
                        <host>192.168.31.102</host>
                        <port>9200</port>
                    </replica>
                </shard>
                <!--分片2-->
                <shard>
                    <internal_replication>true</internal_replication>
                    <!--副本1-->
                    <replica>
                        <host>192.168.31.102</host>
                        <port>9100</port>
                    </replica>
                    <!--副本2-->
                    <replica>
                        <host>192.168.31.101</host>
                        <port>9200</port>
                    </replica>
                </shard>
            </ckcluster_2shards_2replicas>
        </clickhouse_remote_servers>
    <!--zookeeper相关配置-->
        <zookeeper-servers>
            <node index="1">
                <host>192.168.31.101</host>
                <port>2181</port>
            </node>
            <node index="2">
                <host>192.168.31.102</host>
                <port>2181</port>
            </node>
        </zookeeper-servers>
    <!--压缩算法-->
        <clickhouse_compression>
            <case>
                <min_part_size>10000000000</min_part_size>
                <min_part_size_ratio>0.01</min_part_size_ratio>
                <method>lz4</method>
            </case>
        </clickhouse_compression>
    
    

    各个节点metrika0*.xml中复制标识的配置:

    # 192.168.31.101 9100 metrika01.xml
        <macros>
           <shard>01</shard>
           <replica>ckcluster-01-01</replica>
        </macros>
    
    # 192.168.31.101 9200 metrika02.xml
        <macros>
           <shard>02</shard>
           <replica>ckcluster-02-02</replica>
        </macros>
    
    # 192.168.31.102 9100 metrika01.xml
        <macros>
           <shard>02</shard>
           <replica>ckcluster-02-01</replica>
        </macros>
    
    # 192.168.31.102 9200 metrika02.xml
        <macros>
           <shard>01</shard>
           <replica>ckcluster-01-02</replica>
        </macros>
    

    复制标识, 也称为宏配置,这里唯一标识一个副本名称,每个实例都要配置并且都是唯一的。

    • hard 表示分片编号
    • replica是副本标识

    启动实例

    192.168.31.101:

    [root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon 
                                                --pid-file=/var/run/clickhouse-server/clickhouse-server01.pid 
                                                --config-file=/etc/clickhouse-server/config01.xml"
    [root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon 
                                                --pid-file=/var/run/clickhouse-server/clickhouse-server02.pid 
                                                --config-file=/etc/clickhouse-server/config02.xml"
    

    192.168.31.102:

    [root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon 
                                                --pid-file=/var/run/clickhouse-server/clickhouse-server01.pid 
                                                --config-file=/etc/clickhouse-server/config01.xml"
    [root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon 
                                                --pid-file=/var/run/clickhouse-server/clickhouse-server02.pid 
                                                --config-file=/etc/clickhouse-server/config02.xml"
    

    各个节点上查看状态:

     :) SELECT * FROM system.clusters;
    ┌─cluster─────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
    │ ckcluster_2shards_2replicas │         1 │            1 │           1 │ 192.168.31.101 │ 192.168.31.101 │ 9100 │        0 │ default │                  │            0 │                       0 │
    │ ckcluster_2shards_2replicas │         1 │            1 │           2 │ 192.168.31.102 │ 192.168.31.102 │ 9200 │        0 │ default │                  │            0 │                       0 │
    │ ckcluster_2shards_2replicas │         2 │            1 │           1 │ 192.168.31.102 │ 192.168.31.102 │ 9100 │        0 │ default │                  │            0 │                       0 │
    │ ckcluster_2shards_2replicas │         2 │            1 │           2 │ 192.168.31.101 │ 192.168.31.101 │ 9200 │        1 │ default │                  │            0 │                       0 │
    └─────────────────────────────┴───────────┴──────────────┴─────────────┴────────────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘
    

    建库建表

    在每个实例上建库:

     :) create database testdb ;
    

    192.168.31.101 9100 建本地表和分布式表:

     :) create table person_local(ID Int8, Name String, BirthDate Date) 
               ENGINE = ReplicatedMergeTree('/clickhouse/tables/01/person_local','ckcluster-01-01',BirthDate, (Name, BirthDate), 8192);
     :) create table person_all as person_local 
               ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
    

    192.168.31.101 9200 建本地表和分布式表:

     :) create table person_local(ID Int8, Name String, BirthDate Date) 
               ENGINE = ReplicatedMergeTree('/clickhouse/tables/02/person_local','ckcluster-02-02',BirthDate, (Name, BirthDate), 8192);
     :) create table person_all as person_local 
               ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
    

    192.168.31.102 9100 建本地表和分布式表:

     :) create table person_local(ID Int8, Name String, BirthDate Date) 
               ENGINE = ReplicatedMergeTree('/clickhouse/tables/02/person_local','ckcluster-02-01',BirthDate, (Name, BirthDate), 8192);
     :) create table person_all as person_local 
               ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
    

    192.168.31.102 9200 建本地表和分布式表:

     :) create table person_local(ID Int8, Name String, BirthDate Date) 
               ENGINE = ReplicatedMergeTree('/clickhouse/tables/01/person_local','ckcluster-01-02',BirthDate, (Name, BirthDate), 8192);
     :) create table person_all as person_local 
               ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
    

    本地表创建的语法:

    create table person_local(ID Int8, Name String, BirthDate Date) 
              ENGINE = ReplicatedMergeTree('/clickhouse/tables/${shard}/person_local','${replica}',BirthDate, (Name, BirthDate), 8192);
    
    • /clickhouse/tables/${shard}/person_local 代表的是这张表在ZooKeeper上的路径。即配置在相同shard里面的不同replica的机器需要配置相同的路径,不同shard的路径不同。shard对应该实例的metrika.xml配置。
    • ${replica} 分片的名称,需要每个实例都不同,对应该实例的metrika.xml配置

    分布表语法:

     :) create table person_all as person_local 
              ENGINE = Distributed(${cluster_name}, ${db_name}, ${local_table_name}, rand());
    
    • ${cluster_name} 集群名称
    • ${db_name} 库名
    • ${local_table_name} 本地表名
    • rand() 是分布式算法
      分布式表只是作为一个查询引擎,本身不存储任何数据,查询时将sql发送到所有集群分片,然后进行进行处理和聚合后将结果返回给客户端,因此clickhouse限制聚合结果大小不能大于分布式表节点的内存,当然这个一般条件下都不会超过。
      分布式表可以所有实例都创建,也可以只在一部分实例创建,这个和业务代码中查询的示例一致,建议设置多个,当某个节点挂掉时可以查询其他节点上的表。

    新版的本地表建表方式

     :) create table people_local ON CLUSTER ckcluster_2shards_2replicas (ID Int8, Name String, BirthDate Date) 
                ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/people_local','{replica}') PARTITION BY toYYYYMMDD(BirthDate) ORDER BY (Name, BirthDate) SETTINGS index_granularity = 8192;
    

    这种建表方式无需在所有的节点都建表,自动同步到所有节点,只需所有节点提前统一建好库就行。

    数据测试

    导入1w的csv:

    [root@ ~]# clickhouse-client  --host 127.0.0.1 --port 9200 --database testdb  --query="insert into person_all FORMAT CSV"  < /tmp/a.csv 
    

    检查各个实例的数据量:

    192.168.31.101:9100 :) select count(*) from testdb.person_local ;
    ┌─count()─┐
    │    4932 │
    └─────────┘
    
    192.168.31.101:9200 :) select count(*) from testdb.person_local ;
    ┌─count()─┐
    │    5068 │
    └─────────┘
    
    192.168.31.102:9100 :) select count(*) from testdb.person_local ;
    ┌─count()─┐
    │    5068 │
    └─────────┘
    
    192.168.31.102:9200 :) select count(*) from testdb.person_local ;
    ┌─count()─┐
    │    4932 │
    └─────────┘
    

    可以看出,各个分配的两副本的数量关系是正确的

    down一个节点

    这里将192.168.31.102 9001节点kill掉

    [root@ ~]# ps -ef | grep click 
    clickho+  3485     1  2 17:33 ?        00:01:25 /usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server02.pid --config-file=/etc/clickhouse-server/config02.xml
    clickho+  3547     1  2 17:34 ?        00:01:34 /usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server01.pid --config-file=/etc/clickhouse-server/config01.xml
    root     12650 12503  0 18:42 pts/0    00:00:00 grep --col click
    [root@ ~]# kill -SIGTERM 3547 
    

    继续导入数据,进行读写测试,发现集群功能正常。
    查看一下down掉的节点对应的副本数据:

    192.168.31.101:9200 :) select * from person_local where BirthDate='2020-01-01' ;
    ┌─ID─┬─Name─────────────────────────────┬──BirthDate─┐
    │  2 │ 26ab0db90d72e28ad0ba1e22ee510510 │ 2020-01-01 │
    │ 10 │ 31d30eea8d0968d6458e0ad0027c9f80 │ 2020-01-01 │
    │  4 │ 48a24b70a0b376535542b996af517398 │ 2020-01-01 │
    │  9 │ 7c5aba41f53293b712fd86d08ed5b36e │ 2020-01-01 │
    │  7 │ 84bc3da1b3e33a18e8d5e1bdd7a18d7a │ 2020-01-01 │
    │  6 │ 9ae0ea9e3c9c6e1b9b6252c8395efdc1 │ 2020-01-01 │
    │  1 │ b026324c6904b2a9cb4b88d6d61c81d1 │ 2020-01-01 │
    └────┴──────────────────────────────────┴────────────┘
    

    这时候,重启192.168.31.102:9100节点,查看数据:

    [root@ ~]#  su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server01.pid --config-file=/etc/clickhouse-server/config01.xml"
    [root@ ~]#  clickhouse-client  --port 9100
    192.168.31.102:9100 :) use testdb
    192.168.31.102:9100 :) select * from person_local where BirthDate='2020-01-01' ;
    ┌─ID─┬─Name─────────────────────────────┬──BirthDate─┐
    │  2 │ 26ab0db90d72e28ad0ba1e22ee510510 │ 2020-01-01 │
    │ 10 │ 31d30eea8d0968d6458e0ad0027c9f80 │ 2020-01-01 │
    │  4 │ 48a24b70a0b376535542b996af517398 │ 2020-01-01 │
    │  9 │ 7c5aba41f53293b712fd86d08ed5b36e │ 2020-01-01 │
    │  7 │ 84bc3da1b3e33a18e8d5e1bdd7a18d7a │ 2020-01-01 │
    │  6 │ 9ae0ea9e3c9c6e1b9b6252c8395efdc1 │ 2020-01-01 │
    │  1 │ b026324c6904b2a9cb4b88d6d61c81d1 │ 2020-01-01 │
    └────┴──────────────────────────────────┴────────────┘
    

    被kill的节点在重启后,数据恢复到和副本一致,集群业务正常。

  • 相关阅读:
    在Jenkins中使用sonar进行静态代码检查
    privoxy自动请求转发到多个网络
    实现自动SSH连接
    云平台的微服务架构实践
    JHipster技术栈理解
    部署模式
    部署模式
    部署模式
    JHipster生成单体架构的应用示例
    JHipster生成微服务架构的应用栈(二)- 认证微服务示例
  • 原文地址:https://www.cnblogs.com/wshenjin/p/13143929.html
Copyright © 2011-2022 走看看