zoukankan      html  css  js  c++  java
  • 通过waterdrop导入clickhourse数据

    hive表导入waterdrop数据

    配置batch.conf.template拷贝一个为batch.conf(以下是我例子配置,按照自己需求可以做调整)

    ######
    ###### This config file is a demonstration of batch processing in waterdrop config
    ######
    
    spark {
      # You can set spark configuration here
      # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
      spark.app.name = "Waterdrop"
      spark.executor.instances = 2
      spark.executor.cores = 1
      spark.executor.memory = "1g"
    }
    
    input {
      # This is a example input plugin **only for test and demonstrate the feature input plugin**
      hive {
        pre_sql = "select * from terminal.XX"
        result_table_name = "XX"
      }
    
    
    
      # You can also use other input plugins, such as hdfs
      # hdfs {
      #   result_table_name = "accesslog"
      #   path = "hdfs://hadoop-cluster-01/nginx/accesslog"
      #   format = "json"
      # }
    
      # If you would like to get more information about how to configure waterdrop and see full list of input plugins,
      # please go to https://interestinglab.github.io/waterdrop/#/zh-cn/configuration/base
    }
    
    filter {
    #  # split data by specific delimiter
    #  split {
    #    fields = ["msg", "name"]
    #    delimiter = " "
    #    result_table_name = "accesslog"
    #    remove {
    #        source_field = ["imei1", "imei2"]
    #    }
    #  }
    
    
    
      # you can also you other filter plugins, such as sql
      # sql {
      #   sql = "select * from accesslog where request_time > 1000"
      # }
    
      # If you would like to get more information about how to configure waterdrop and see full list of filter plugins,
      # please go to https://interestinglab.github.io/waterdrop/#/zh-cn/configuration/base
    }
    
    output {
      # choose stdout output plugin to output data to console
      #stdout {
      #}
    
          clickhouse {
            host = "127.0.0.1:8123"
            database = "waterdrop"
            table = "access_log"
            fields = ["XX","day"]
            username = "user_richdm"
            password = "richdm"
        }
    
      # you can also you other output plugins, such as sql
      # hdfs {
      #   path = "hdfs://hadoop-cluster-01/nginx/accesslog_processed"
      #   save_mode = "append"
      # }
    
      # If you would like to get more information about how to configure waterdrop and see full list of output plugins,
      # please go to https://interestinglab.github.io/waterdrop/#/zh-cn/configuration/base
    }

    执行命令

    ./start-waterdrop.sh --master yarn --deploy-mode client --config ../config/batch.conf

    clickhouse的库表都要预先建立好。不会自动给你建立

  • 相关阅读:
    CS224d lecture 16札记
    CS224d lecture 15札记
    CS224d lecture 14札记
    CS224d lecture 13札记
    将博客搬至CSDN
    三张图理解JavaScript原型链
    三道题理解软件流水
    网络安全密码学课程笔记
    “wuliao“(无聊)聊天软件
    大二小学期C#资产管理大作业小记
  • 原文地址:https://www.cnblogs.com/yaohaitao/p/15612377.html
Copyright © 2011-2022 走看看