zoukankan      html  css  js  c++  java
  • clickhouse安装使用文档

    Clickhouse简介

    Clickhouse是什么

    1. 开源的列存储数据库管理系统

    2. 支持线性扩展

    3. 简单方便

    4. 高可靠性

    5. 容错(支持多主机异步复制,可以跨多个数据中心部署。 单个节点或整个数据中心的停机时间不会影响系统的读写可用性)

    clickhouse架构及存储方式

    clickhouse架构未开源

    clickhouse特点

    用于对干净,结构良好且不可变的事件或日志进行分析。建议将每个这样的流放入一个带有预加入尺寸的单一宽事实表中。

    Clickhouse使用场景

    可行的应用程序的一些例子:

    • Web和App分析
    • 广告网络和RTB
    • 电信
    • 电子商务和金融
    • 信息安全
    • 监测和遥测
    • 时间序列
    • 商业智能
    • 线上游戏
    • 物联网
    • 事务性工作负载(OLTP)
    • 高请求率的键值访问
    • Blob或文档存储
    • 超标准化的数据

    不适用场景

    clickhouse安装

    clickhouse单节点安装

    检查系统是否支持clickhouse安装

    执行命令:

    grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

    若显示为SSE4.2suported 则可以继续安装如为后者:

    那么很不幸的告诉你你的电脑cpu不支持sse指令集,请自想办法。

    拉取repo源文件

    curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash

    或者直接新建:

    altinity_clickhouse.repo文件

    将此内容插入centos6版本

    [altinity_clickhouse]

    name=altinity_clickhouse

    baseurl=https://packagecloud.io/altinity/clickhouse/el/6/$basearch

    repo_gpgcheck=1

    gpgcheck=0

    enabled=1

    gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey

    sslverify=1

    sslcacert=/etc/pki/tls/certs/ca-bundle.crt

    metadata_expire=300

     

    [altinity_clickhouse-source]

    name=altinity_clickhouse-source

    baseurl=https://packagecloud.io/altinity/clickhouse/el/6/SRPMS

    repo_gpgcheck=1

    gpgcheck=0

    enabled=1

    gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey

    sslverify=1

    sslcacert=/etc/pki/tls/certs/ca-bundle.crt

    metadata_expire=300

    centos7版本

    [altinity_clickhouse]

    name=altinity_clickhouse

    baseurl=https://packagecloud.io/altinity/clickhouse/el/7/$basearch

    repo_gpgcheck=1

    gpgcheck=0

    enabled=1

    gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey

    sslverify=1

    sslcacert=/etc/pki/tls/certs/ca-bundle.crt

    metadata_expire=300

     

    [altinity_clickhouse-source]

    name=altinity_clickhouse-source

    baseurl=https://packagecloud.io/altinity/clickhouse/el/7/SRPMS

    repo_gpgcheck=1

    gpgcheck=0

    enabled=1

    gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey

    sslverify=1

    sslcacert=/etc/pki/tls/certs/ca-bundle.crt

    metadata_expire=300

    yum list  ‘clickhouse*’

    yum –y install  ‘clickhouse*

    clickhouse多节点安装

    在每台机器上安装click house数据库然后,在每台机器上做如下修改

    修改host文件

    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

    192.168.3.251 host1

    192.168.3.252 host2

    192.168.3.247 host3

    ~                    

    新建文件metrika.xml

    在/etc下新建文件cd /etc

    vi   metrika.xml

    将以下内容修改后粘贴入metrika.xml

    <yandex>

    <clickhouse_remote_servers>

        <perftest_3shards_1replicas>

            <shard>

                 <internal_replication>true</internal_replication>

                <replica>

                    <host>192.168.3.247</host>

                    <port>9000</port>

                </replica>

            </shard>

            <shard>

                <replica>

                    <internal_replication>true</internal_replication>

                    <host>192.168.3.252</host>

                    <port>9000</port>

                </replica>

            </shard>

                       <shard>

                <replica>

                    <internal_replication>true</internal_replication>

                    <host>192.168.3.251</host>

                    <port>9000</port>

                </replica>

            </shard>

        </perftest_3shards_1replicas>

    </clickhouse_remote_servers>

    <zookeeper-servers>

      <node index="1">

        <host>192.168.3.251</host>

        <port>2181</port>

      </node>

    </zookeeper-servers>

    <macros>

        <replica>192.168.3.252</replica>

    </macros>

    <networks>

       <ip>::/0</ip>

    </networks>

    <clickhouse_compression>

    <case>

      <min_part_size>10000000000</min_part_size>            

      <min_part_size_ratio>0.01</min_part_size_ratio>

      <method>lz4</method>

    </case>

    </clickhouse_compression>

    </yandex>

    修改/etc/clickhouse-server下的config.xml文件

      <!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->

        <!-- <listen_host>::</listen_host> -->

        <listen_host>::1</listen_host>

        <listen_host>192.168.3.252</listen_host>

    clickhouse使用

    简单的使用

    启动

     /etc/init.d/clickhouse-server start

    命令行clickhouse-client –h host –u –p

    默认即可:使用clickhouse-client 进入客户端。

    DML(data manipulation language)

    insert into funtest values(3,'xiaoming',22,'2017-11-09')

    insert into funtest values(32,'xiaolan',33,'2017-11-08')

    insert into funtest values(35,'xiaotong',33,'2017-11-07')

    insert into funtest values(4,'xiaohuang',33,'2017-11-08')

    insert into funtest values(44,'xiaolvas',34,'2017-11-05')

    insert into funtest values(6,'xiaohuanasg',32,'2017-11-28')

    select *  from funtest

    select *  from funtest order by id

    select * from funtest order by  id desc

    select avg(age)  from funtest

    select count(name) from funtest

    select age from funtest group by age

    select round(age/3) FROM funtest

    select cast('2015-12-22' as date) from funtest

    select cast('2015-12-22' as date)+30 from funtest

    select stddev_samp(age) FROM funtest

    select upper('hhh') from funtest

    select upper(name) from funtest

    select abs(-1) from funtest

    select * FROM funtest where times =cast('2015-12-22' as date)

    select max(age) from funtest

    select case when name ='xiaoming' then concat(name,'dddd') else 'ddddfdfdfdf' end  from funtest

    select substring(name,1,3) from funtest

    select rand() from funtest

    DDL(data definition language)

    create table funtest(id UInt32, name String ,age UInt32,times Date)ENGINE=Log

    drop table funtest

    alter table ontime_all add COLUMN name String;

    性能测试

    性能测试代码如下

    获取数据

    for s in `seq 1987 2017`

    do

    for m in `seq 1 12`

    do

    echo http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_${s}_${m}.zip >> a.lst

    done

    done

    解压上传至click house数据库

    for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/.00//g' | clickhouse-client  --query="INSERT INTO ontime_test FORMAT CSVWithNames"; done

    创建hive表

    CREATE TABLE ontime

    (

        Year int,

        Quarter int,

        Month int,

        DayofMonth int,

        DayOfWeek int,

        FlightDate Date,

        UniqueCarrier String,

        AirlineID int,

        Carrier String,

        TailNum String,

        FlightNum String,

        OriginAirportID int,

        OriginAirportSeqID int,

        OriginCityMarketID int,

        Origin String,

        OriginCityName String,

        OriginState String,

        OriginStateFips String,

        OriginStateName String,

        OriginWac int,

        DestAirportID int,

        DestAirportSeqID int,

        DestCityMarketID int,

        Dest String,

        DestCityName String,

        DestState String,

        DestStateFips String,

        DestStateName String,

        DestWac int,

        CRSDepTime int,

        DepTime int,

        DepDelay int,

        DepDelayMinutes int,

        DepDel15 int,

        DepartureDelayGroups String,

        DepTimeBlk String,

        TaxiOut int,

        WheelsOff int,

        WheelsOn int,

        TaxiIn int,

        CRSArrTime int,

        ArrTime int,

        ArrDelay int,

        ArrDelayMinutes int,

        ArrDel15 int,

        ArrivalDelayGroups int,

        ArrTimeBlk String,

        Cancelled int,

        CancellationCode String,

        Diverted int,

        CRSElapsedTime int,

        ActualElapsedTime int,

        AirTime int,

        Flights int,

        Distance int,

        DistanceGroup int,

        CarrierDelay int,

        WeatherDelay int,

        NASDelay int,

        SecurityDelay int,

        LateAircraftDelay int,

        FirstDepTime String,

        TotalAddGTime String,

        LongestAddGTime String,

        DivAirportLandings String,

        DivReachedDest String,

        DivActualElapsedTime String,

        DivArrDelay String,

        DivDistance String,

        Div1Airport String,

        Div1AirportID int,

        Div1AirportSeqID int,

        Div1WheelsOn String,

        Div1TotalGTime String,

        Div1LongestGTime String,

        Div1WheelsOff String,

        Div1TailNum String,

        Div2Airport String,

        Div2AirportID int,

        Div2AirportSeqID int,

        Div2WheelsOn String,

        Div2TotalGTime String,

        Div2LongestGTime String,

        Div2WheelsOff String,

        Div2TailNum String,

        Div3Airport String,

        Div3AirportID int,

        Div3AirportSeqID int,

        Div3WheelsOn String,

        Div3TotalGTime String,

        Div3LongestGTime String,

        Div3WheelsOff String,

        Div3TailNum String,

        Div4Airport String,

        Div4AirportID int,

        Div4AirportSeqID int,

        Div4WheelsOn String,

        Div4TotalGTime String,

        Div4LongestGTime String,

        Div4WheelsOff String,

        Div4TailNum String,

        Div5Airport String,

        Div5AirportID int,

        Div5AirportSeqID int,

        Div5WheelsOn String,

        Div5TotalGTime String,

        Div5LongestGTime String,

        Div5WheelsOff String,

        Div5TailNum String

    )row format delimited

    fields terminated by ','

    stored as textfile;

    load data inpath ‘/data’into table ontime;

     修改hive存储格式
    orc

    与spark对比测试

    创建clickhouse本地表

    CREATE TABLE ontime

    (

        Year UInt16,

        Quarter UInt8,

        Month UInt8,

        DayofMonth UInt8,

        DayOfWeek UInt8,

        FlightDate Date,

        UniqueCarrier FixedString(7),

        AirlineID Int32,

        Carrier FixedString(2),

        TailNum String,

        FlightNum String,

        OriginAirportID Int32,

        OriginAirportSeqID Int32,

        OriginCityMarketID Int32,

        Origin FixedString(5),

        OriginCityName String,

        OriginState FixedString(2),

        OriginStateFips String,

        OriginStateName String,

        OriginWac Int32,

        DestAirportID Int32,

        DestAirportSeqID Int32,

        DestCityMarketID Int32,

        Dest FixedString(5),

        DestCityName String,

        DestState FixedString(2),

        DestStateFips String,

        DestStateName String,

        DestWac Int32,

        CRSDepTime Int32,

        DepTime Int32,

        DepDelay Int32,

        DepDelayMinutes Int32,

        DepDel15 Int32,

        DepartureDelayGroups String,

        DepTimeBlk String,

        TaxiOut Int32,

        WheelsOff Int32,

        WheelsOn Int32,

        TaxiIn Int32,

        CRSArrTime Int32,

        ArrTime Int32,

        ArrDelay Int32,

        ArrDelayMinutes Int32,

        ArrDel15 Int32,

        ArrivalDelayGroups Int32,

        ArrTimeBlk String,

        Cancelled UInt8,

        CancellationCode FixedString(1),

        Diverted UInt8,

        CRSElapsedTime Int32,

        ActualElapsedTime Int32,

        AirTime Int32,

        Flights Int32,

        Distance Int32,

        DistanceGroup UInt8,

        CarrierDelay Int32,

        WeatherDelay Int32,

        NASDelay Int32,

        SecurityDelay Int32,

        LateAircraftDelay Int32,

        FirstDepTime String,

        TotalAddGTime String,

        LongestAddGTime String,

        DivAirportLandings String,

        DivReachedDest String,

        DivActualElapsedTime String,

        DivArrDelay String,

        DivDistance String,

        Div1Airport String,

        Div1AirportID Int32,

        Div1AirportSeqID Int32,

        Div1WheelsOn String,

        Div1TotalGTime String,

        Div1LongestGTime String,

        Div1WheelsOff String,

        Div1TailNum String,

        Div2Airport String,

        Div2AirportID Int32,

        Div2AirportSeqID Int32,

        Div2WheelsOn String,

        Div2TotalGTime String,

        Div2LongestGTime String,

        Div2WheelsOff String,

        Div2TailNum String,

        Div3Airport String,

        Div3AirportID Int32,

        Div3AirportSeqID Int32,

        Div3WheelsOn String,

        Div3TotalGTime String,

        Div3LongestGTime String,

        Div3WheelsOff String,

        Div3TailNum String,

        Div4Airport String,

        Div4AirportID Int32,

        Div4AirportSeqID Int32,

        Div4WheelsOn String,

        Div4TotalGTime String,

        Div4LongestGTime String,

        Div4WheelsOff String,

        Div4TailNum String,

        Div5Airport String,

        Div5AirportID Int32,

        Div5AirportSeqID Int32,

        Div5WheelsOn String,

        Div5TotalGTime String,

        Div5LongestGTime String,

        Div5WheelsOff String,

        Div5TailNum String

    ) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192)

    创建分区表

    CREATE TABLE ontimetest AS ontime ENGINE = Distributed(perftest_3shards_1replicas, default, ontime, rand())

    注意:

    每个节点分别创建本地表,和分区表

  • 相关阅读:
    Oracle(创建序列)
    BoostrapValidator使用方法
    SpringBoot(八)----SpringBoot配置日志文件
    SpringBoot(七)-----查看配置报告
    eclipse导入新项目后,运行时找不到主类解决办法(转载)
    严重性代码说明项目文件行 禁止显示状态错误 C4996 fopen('fscanf'、strcmp):This function or variable may be unsafe. 最全解决办法
    Spring Boot(六)----application.properties文件加载顺序
    No compiler is provided in this environment. Perhaps you are running on a JRE rather than a JDK?
    异常处理(一):Non-parseable POM C:Usersadmin.m2 epositoryorgspringframework问题解决方案
    Spring Boot(五)----profile文件路径设置
  • 原文地址:https://www.cnblogs.com/tsxylhs/p/7837707.html
Copyright © 2011-2022 走看看