zoukankan      html  css  js  c++  java
  • clickhouse学习笔记

    introduction 

    https://www.youtube.com/watch?v=fGG9dApIhDU

    glance of features

    • shared nothing architecture
    • column storage with vectorized query execution
    • build-in sharding and replication

    延伸阅读:

    replicas help with concurrency, shards  add IOPs.

    shard table into different nodes, and replicate data one each of them.

    use zookeeper to maintain the shared state and leader election.

    clickhouse code is optimized for speed

    bottom-up design: algorithms determine interface

    ch的设计比较特殊,它是根据算法的实现来决定接口的定义。而不是常见的由用法(或使用习惯)决定接口。

    specialized algorithms for common operations,seleted by:

    由下面四个要素来决定某个操作应该使用哪种算法来执行。

    • Data type:14 GROUP BY algorithms
    • Data size:whether data fits in memory
    • Ordering: whether data is already [partly] sorted or not
    • Data distribution: e.g. using multi-armed bandits to optimize LZ4 decomposition

    延伸阅读:

    Introduction to Multi-Armed Bandits  [pdf下载]   

    Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. 

    LZ4  (一种极快的压缩/解压算法,但压缩比率较差)

    LZ4 is lossless compression algorithm, providing compression speed > 500 MB/s per core, scalable with multi-cores CPU. 

    vectorized query execution

    • SIMD (SSE 4.2+)
    • efficient dispatch on all available cores

    延伸阅读:

    CMU 课程 Vectorized Query Execution

    Vectorized query execution batches multiples rows together in a columnar format, and each operator uses simple loops to iterate over data within a batch. This feature greatly reduces the CPU usage for reading, writing and query operations like scanning, filtering.

    how do distributed queries work?

    application will visit one node of clickhouse, this node will dispatch subselect to different nodes and aggregateState will compute locally on mutil nodes, then the finnal aggregation will be merged on initiator node, and feedback to application.

    其他

    TPC-DS is an enterprise-class benchmark, published and maintained by the Transaction Processing Performance Council (TPC), to measure the performance of decision support systems running on SQL-based big data systems.

  • 相关阅读:
    Struts22222
    Struts2
    Java事务的概念
    Java设计模式之单例模式
    Spring的大框架
    mybatis大框架
    springmvc和spring的区别
    JavaScript-原型&原型链&原型继承&组合函数
    美丽的CSS图形和HTML5
    易买网项目的总实现超级详解
  • 原文地址:https://www.cnblogs.com/elaron/p/14732145.html
Copyright © 2011-2022 走看看