一、集群说明
zookeeper 需要单独部署在其他机器上,以免clickhouse 节点挂掉,引起zookeeper 挂掉。
0. 高可用原理:zookeeper + ReplicatedMergeTree(复制表) + Distributed(分布式表)
1. 前提准备:所有节点防火墙关闭或者开放端口;所有节点建立互信(免密码登录);hosts表和主机名一定要集群保持一致正确配置,因为zookeeper返
回的是主机名,配置错误或不配置复制表时会失败.
clickhouse测试节点3个:10.0.0.236 cdhserver1(clickhouse1), 10.0.0.237 cdhserver2 (clickhouse2),10.0.0.238 cdhserver3 (clickhouse3),10.0.0.239 cdhserver4
(clickhouse4)
zookeeper测试节点3个:10.0.0.237 cdhserver2 (zookeeper),10.0.0.238 cdhserver3 (zookeeper),10.0.0.239 cdhserver4 (zookeeper)
配置方案:4个节点点各配置两个clickhouse实例,相互备份.
cdhserver1: 实例1, 端口: tcp 9006, http 8123, 同步端口9006, 类型: 主节点
cdhserver2: 实例2, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片1, 副本1
cdhserver3: 实例1, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片2, 副本1
cdhserver4: 实例2, 端口: tcp 9006, , http 8123, 同步端口9006, 类型: 分片3, 副本1
二、环境准备
1)主机配置(根据自身情况)
10.0.0.236 cdhserver1 centos 7.1 32G 200G
10.0.0.237 cdhserver2 centos 7.1 32G 200G
10.0.0.238 cdhserver3 centos 7.1 32G 200G
10.0.0.239 cdhserver4 centos 7.1 32G 200G
2)hosts表和主机名
3)所有节点防火墙关闭或者开放端口;
# 1.关闭防火墙
service iptables stop
chkconfig iptables off
chkconfig ip6tables off
# 2.关闭selinux
修改/etc/selinux/config中的SELINUX=disabled后重启
[root@cdhserver1 ~]# vim /etc/selinux/config
SELINUX=disabled
4)优化所有节点服务器打开文件个数
在/etc/security/limits.conf、/etc/security/limits.d/90-nproc.conf这2个文件的末尾加入一下内容:
[root@cdhserver1 software]# vim /etc/security/limits.conf
在文件末尾添加:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
[root@cdhserver1 software]# vim /etc/security/limits.d/90-nproc.conf
在文件末尾添加:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
重启服务器it -n 或者ulimit -a查看设置结果
[root@cdhserver1 ~]# ulimit -n
65536
5)所有节点建立互信(免密码登录);
1、在各节点通过ssh-keygen生成RSA密钥和公钥
ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa
2、将所有的公钥文件汇总到一个总的授权key文件中,在cdhserver1机器执行如下命令,必须逐行执行:
ssh cdhserver1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver3 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh cdhserver4 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3、设置授权key文件的权限,在M01机器执行如下命令:
chmod 600 ~/.ssh/authorized_keys
4、分发授权key文件到所有服务器,必须逐行执行:
scp ~/.ssh/authorized_keys cdhserver1:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver2:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver3:~/.ssh/
scp ~/.ssh/authorized_keys cdhserver4:~/.ssh/
三、分布式集群安装
1、包准备
说明:
1) 安装资源获取: https://packagecloud.io/Altinity/clickhouse。
2)也可以直接在csdn 下载:https://download.csdn.net/download/u013205003/10968848
2、在所有节点安装clickhouse(centos 7.1 为例)
1) 分别在个服务器上进行如下安装和配置
2)安装clickhouse
安装 libicu
mkdir -p /usr/local/icu/
cd /usr/local/icu/libicu-4.2.1-14.el6.x86_64.rpm
rpm -ivh libicu-4.2.1-14.el6.x86_64.rpm
安装clickhouse
rpm -ivh clickhouse-server-common-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-compressor-1.1.54336-3.el6.x86_64.rpm
rpm -ivh clickhouse-common-static-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-server-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-client-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-test-18.14.12-1.el6.x86_64.rpm
rpm -ivh clickhouse-debuginfo-18.14.12-1.el6.x86_64.rpm
3、配置文件修改
1)四个节点修改配置文件config.xml
[root@cdhserver1 ~]# vim /etc/clickhouse-server/config.xml
<http_port>8123</http_port>
<tcp_port>9006</tcp_port>
<listen_host>::</listen_host>
<!-- <listen_host>::1</listen_host> -->
<!-- <listen_host>127.0.0.1</listen_host> -->
<!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
<!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->
2)四个节点修改配置文件users.xml添加用户
[root@cdhserver1 ~]# vim /etc/clickhouse-server/users.xml
修改/etc/clickhouse-server/users.xml
在<!-- Example of user with readonly access. -->上新增:
<ck> <password_sha256_hex>8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92</password_sha256_hex> <networks incl="networks" replace="replace"> <ip>::/0</ip> </networks> <profile>default</profile> <quota>default</quota> </ck>
3)四个节点的/etc/clickhouse-server目录下新建metrika.xml文件集群分片的配置
[root@cdhserver1 ~]# vim /etc/clickhouse-server/metrika.xml
添加如下内容:
<?xml version="1.0"?>
<yandex>
<!--ck集群节点-->
<clickhouse_remote_servers>
<idc_cluster>
<!--分片1-->
<shard>
<weight>1</weight>
<replica>
<host>cdhserver2</host>
<port>9006</port>
<user>ck</user>
<password>123456</password>
<compression>true</compression>
</replica>
</shard>
<!--分片2-->
<shard>
<weight>1</weight>
<replica>
<host>cdhserver3</host>
<port>9006</port>
<user>ck</user>
<password>123456</password>
<compression>true</compression>
</replica>
</shard>
<!--分片3-->
<shard>
<weight>1</weight>
<replica>
<host>cdhserver4</host>
<port>9006</port>
<user>ck</user>
<password>123456</password>
<compression>true</compression>
</replica>
</shard>
</idc_cluster>
<idc_replicate_cluster>
<!--分片1-->
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>cdhserver2</host>
<port>9006</port>
<user>ck</user>
<password>123456</password>
</replica>
<replica>
<host>hadoop2</host>
<port>9006</port>
<user>ck</user>
<password>123456</password>
</replica>
</shard>
<!--分片2-->
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>cdhserver3</host>
<port>9006</port>
<user>ck</user>
<password>123456</password>
</replica>
<replica>
<host>hadoop3</host>
<port>9006</port>
<user>ck</user>
<password>123456</password>
</replica>
</shard>
<!--分片3-->
<shard>
<weight>1</weight>
<internal_replication>true</internal_replication>
<replica>
<host>cdhserver4</host>
<port>9006</port>
<user>ck</user>
<password>123456</password>
</replica>
<replica>
<host>hadoop4</host>
<port>9006</port>
<user>ck</user>
<password>123456</password>
</replica>
</shard>
</idc_replicate_cluster>
</clickhouse_remote_servers>
<!--zookeeper相关配置-->
<zookeeper-servers>
<node index="1">
<host>cdhserver2</host>
<port>2181</port>
</node>
<node index="2">
<host>cdhserver3</host>
<port>2181</port>
</node>
<node index="3">
<host>cdhserver4</host>
<port>2181</port>
</node>
</zookeeper-servers>
<!--分片和副本配置-->
<macros>
<replica>cdhserver1</replica>
</macros>
<!--开启远程访问-->
<networks>
<ip>::/0</ip>
</networks>
<!--压缩相关配置-->
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method> <!--压缩算法lz4压缩比zstd快, 更占磁盘-->
</case>
</clickhouse_compression>
</yandex>
注意:上面标红的地方需要根据节点不同去修改
cdhserver2 分片1,副本1:
<macros> <layer>01</layer> <shard>01</shard> <replica>cdhserver2</replica> </macros>
cdhserver3 分片1,副本1:
<macros> <layer>01</layer> <shard>02</shard> <replica>cdhserver3</replica> </macros>
cdhserver4 分片1,副本1:
<macros> <layer>01</layer> <shard>03</shard> <replica>cdhserver4</replica> </macros>
4、 启动脚本
启动脚本路径:/etc/init.d/clickhouse-server
前台启动:
[root@cdhserver1 software]# clickhouse-server --config-file=/etc/clickhouse-server/config.xml
后台启动:
[root@cdhserver1 software]# nohup clickhouse-server --config-file=/etc/clickhouse-server/config.xml >null 2>&1 &
复制修改
然后进入到配置文件目录,将原有配置文件拷贝一份,这里是config1.xml,然后修改配置:
修改config.xm,pid使用clickhouse-server-1.pid
五、测试. 创建本地复制表和分布式表
1、所有实例配置完上面这些之后,分别执行启动命令启动,然后所有实例都执行下面语句创建数据库:
CREATE DATABASE monchickey;
2、 然后对于所有实例分别创建对应的复制表,这里测试创建一个简单的表
drop table monchickey.t_co_dwfxdata_0809_30000000_new; CREATE TABLE monchickey.t_co_dwfxdata_0809_30000000_new( CRETIME date, YEAR String, MONTH String, COMPNAME String, CBYS String, YWHD String, DYDJ String, ZCLX String, AMOUNT Float64, YHLB String, YSLX String ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/t_co_dwfxdata_0809_30000000_new', '{replica}', CRETIME, (COMPNAME), 8192 ); drop table monchickey.t_co_dwfxdata_0809_30000000_all; CREATE TABLE monchickey.t_co_dwfxdata_0809_30000000_all AS monchickey.t_co_dwfxdata_0809_30000000_new ENGINE = Distributed(distable, monchickey, t_co_dwfxdata_0809_30000000_new, rand()); select count (*) from monchickey.t_co_dwfxdata_0809_30000000_new; select count(*) from monchickey.t_co_dwfxdata_0809_30000000_all;
3、简单数据样例
分别在四台机器上插入如下数据
cdhserver1 /clickhouse1
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','其他长期职工福利','检修-输电运检','500kV及以上','架空线路',9.4441398224E8,'','工');
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','其他短期薪酬','检修-输电运检','500kV及以上','架空线路',2.0894114651E8,'','工');
cdhserver2 /clickhouse2
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','其他长期职工福利','检修-输电运检','500kV及以上','架空线路',9.4441398224E8,'','工')
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES (('2016-01-','2016年','1月','单位8','临时用工薪酬','检修-输电运检','500kV及以上','架空线路',8.5247987777E8,'','工');
cdhserver3 /clickhouse3
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','劳务派遣费','检修-输电运检','500kV及以上','架空线路',3.59378772E8,'','工');
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','委托服务费','检修-输电运检','500kV及以上','架空线路',4.7638581405E8,'','费');
insert into monchickey.t_co_dwfxdata_0809_30000000_all VALUES ('2016-01-01','2016年','1月','单位8','财产保险费','检修-输电运检','500kV及以上','架空线路',9.527716281E8,'','费');
在任意一台机器上查询数据
a.能查询到完成 7条数据
b.查询总数为7条
select * from monchickey.t_co_dwfxdata_0809_30000000_all;--能查询到完整的7 条数据
select count(*) from monchickey.t_co_dwfxdata_0809_30000000_all; -- 总数为7
六、集群表操作
1、表增加字段
alter TABLE idc.web_initial ON CLUSTER idc_cluster add COLUMN tracerouteip String AFTER jitter;
2、更改列的类型
alter TABLE idc.web_initial ON CLUSTER idc_cluster modify column tracerouteip UInt16;
3、 删除列
alter TABLE idc.web_initial ON CLUSTER idc_cluster drop column tracerouteip;
4、删除集群多个节点同一张表
drop table tabl on cluster clickhouse_cluster;
drop TABLE if exists idc.web_initial on CLUSTER idc_cluster ;
5、清理集群表数据
truncate table lmmbase.user_label_uid on cluster crm_4shards_1replicas;
七、优化
1、max_table_size_to_drop
默认情况下, Clickhouse 不允许删除分区或表的大小大于 50GB 的分区或表. 可以通过修改server的配置文件来永久配置. 也可以临时设置一下来删除而不用重启服务.
永久配置
sudo vim /etc/clickhouse-server/config.xml
然后注释掉下面两行
<!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
<!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->
0表示不限制. 或者你可以设置为你想限制的最大的大小.
临时设置
创建个标志文件:
sudo touch '/home/username/clickhouse/flags/force_drop_table' && sudo chmod 666 '/home/username/clickhouse/flags/force_drop_table'
创建好之后, 就可以执行上面的删除分区或表的命令了.
2、max_memory_usage
此参数在/etc/clickhouse-server/users.xml中,表示档次query占内存最大值,超过本值query失败,建议在资源足够情况尽量调大
<max_memory_usage>25000000000</max_memory_usage>
3、删除多个节点上的同张表
drop table tabl on cluster clickhouse_cluster;