zoukankan      html  css  js  c++  java
  • 【spark】连接ClickHouse最优方案调研

    Spark JDBC方案

    查询下垂研究:

    spark jdbc连接mysql:

    context.sparkSession.read.format("jdbc").options(config.toSparkJDBCMap).load().selectExpr("title")
    .filter("phone='13725848961'")
    .filter(row=>row.getAs[String]("phone").startsWith("123"))

    mysql生成的sql日志:

    2021-03-11T03:34:52.019572Z    234927 Connect    root@180.167.157.90 on aut_house using SSL/TLS
    2021-03-11T03:34:52.030608Z    234927 Query    /* mysql-connector-java-8.0.17 (Revision: 16a712ddb3f826a1933ab42b0039f7fb9eebc6ec) */SELECT  @@session.auto_increment_increment AS auto_increment_increment, @@character_set_client AS character_set_client, @@character_set_connection AS character_set_connection, @@character_set_results AS character_set_results, @@character_set_server AS character_set_server, @@collation_server AS collation_server, @@collation_connection AS collation_connection, @@init_connect AS init_connect, @@interactive_timeout AS interactive_timeout, @@license AS license, @@lower_case_table_names AS lower_case_table_names, @@max_allowed_packet AS max_allowed_packet, @@net_write_timeout AS net_write_timeout, @@performance_schema AS performance_schema, @@query_cache_size AS query_cache_size, @@query_cache_type AS query_cache_type, @@sql_mode AS sql_mode, @@system_time_zone AS system_time_zone, @@time_zone AS time_zone, @@tx_isolation AS transaction_isolation, @@wait_timeout AS wait_timeout
    2021-03-11T03:34:52.047211Z    234927 Query    SET NAMES utf8mb4
    2021-03-11T03:34:52.057985Z    234927 Query    SET character_set_results = NULL
    2021-03-11T03:34:52.068195Z    234927 Query    SET autocommit=1
    2021-03-11T03:34:52.079011Z    234927 Query    SELECT `title` FROM house WHERE (`phone` IS NOT NULL) AND (`phone` = '13725848961')

    spark的执行计划:

    == Physical Plan ==
    *(1) Filter <function1>.apply
    +- *(1) Scan JDBCRelation(house) [numPartitions=1] [title#1,phone#4] PushedFilters: [*IsNotNull(phone), *EqualTo(phone,13725848961)], ReadSchema: struct<title:string,phone:string>
    root
     |-- title: string (nullable = true)
     |-- phone: string (nullable = true)

    初步结论: spark jdbc是能够支持查询下沉的,对于filterExpr和selectExpr会下沉

    • ClickHouse
    == Physical Plan ==
    *(1) Project [service#0]
    +- *(1) Scan JDBCRelation(tbtest) [numPartitions=1] [service#0] PushedFilters: [*IsNotNull(metric), *EqualTo(metric,CPU_Idle_Time_alan)], ReadSchema: struct<service:string>

    实验测出,Spark JDBC连接ClickHouse也会查询下沉

    未来优化方向:

    ClickHouse 查询是通过一个distribute view到每个节点拉数据,然后view进行merge。这个动作跟spark的driver侧有点类似。所以未来考虑编写一个spark source,partition去每个节点拉数据,driver侧进行merge。跳过distribute view这个操作

  • 相关阅读:
    非科班能学会编程吗,怎么学习
    自学Java最起码要学到什么程度?
    一个 Java 线程生命周期,我竟然可以扯半小时
    Java基础编程练习题
    Java程序员从小工到专家成神之路(2020版)
    学习 JAVA,有什么书籍推荐?学习的方法和过程是怎样的?
    初学者该如何学习Java(附带Java学习路线)
    Java程序员必备基础:Object的十二个知识点
    Web前端和JAVA应该学哪个?哪个就业形势更好?
    随笔(三十)
  • 原文地址:https://www.cnblogs.com/zhouwenyang/p/14516954.html
Copyright © 2011-2022 走看看