zoukankan html css js c++ java

通过waterdrop导入clickhourse数据

hive表导入waterdrop数据

配置batch.conf.template拷贝一个为batch.conf(以下是我例子配置,按照自己需求可以做调整)

######
###### This config file is a demonstration of batch processing in waterdrop config
######

spark {
  # You can set spark configuration here
  # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
  spark.app.name = "Waterdrop"
  spark.executor.instances = 2
  spark.executor.cores = 1
  spark.executor.memory = "1g"
}

input {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**
  hive {
    pre_sql = "select * from terminal.XX"
    result_table_name = "XX"
  }



  # You can also use other input plugins, such as hdfs
  # hdfs {
  #   result_table_name = "accesslog"
  #   path = "hdfs://hadoop-cluster-01/nginx/accesslog"
  #   format = "json"
  # }

  # If you would like to get more information about how to configure waterdrop and see full list of input plugins,
  # please go to https://interestinglab.github.io/waterdrop/#/zh-cn/configuration/base
}

filter {
#  # split data by specific delimiter
#  split {
#    fields = ["msg", "name"]
#    delimiter = " "
#    result_table_name = "accesslog"
#    remove {
#        source_field = ["imei1", "imei2"]
#    }
#  }



  # you can also you other filter plugins, such as sql
  # sql {
  #   sql = "select * from accesslog where request_time > 1000"
  # }

  # If you would like to get more information about how to configure waterdrop and see full list of filter plugins,
  # please go to https://interestinglab.github.io/waterdrop/#/zh-cn/configuration/base
}

output {
  # choose stdout output plugin to output data to console
  #stdout {
  #}

      clickhouse {
        host = "127.0.0.1:8123"
        database = "waterdrop"
        table = "access_log"
        fields = ["XX","day"]
        username = "user_richdm"
        password = "richdm"
    }

  # you can also you other output plugins, such as sql
  # hdfs {
  #   path = "hdfs://hadoop-cluster-01/nginx/accesslog_processed"
  #   save_mode = "append"
  # }

  # If you would like to get more information about how to configure waterdrop and see full list of output plugins,
  # please go to https://interestinglab.github.io/waterdrop/#/zh-cn/configuration/base
}

执行命令

./start-waterdrop.sh --master yarn --deploy-mode client --config ../config/batch.conf

clickhouse的库表都要预先建立好。不会自动给你建立

查看全文

相关阅读:
视图、触发器、事物、存储过程、函数、流程控制
 pymysql
单表查询与多表查询
 多线程学习（第三天）线程间通信
 多线程学习（第二天）Java内存模型
 多线程学习（第一天）java语言的线程
 springboot集成es7(基于high level client)
elasticSearch(六)--全文搜索
 elasticSearch(五)--排序
 elasticSearch(四)--结构化查询

原文地址：https://www.cnblogs.com/yaohaitao/p/15612377.html