zoukankan      html  css  js  c++  java
  • Druid学习之路 (四)Druid的数据采集格式

    作者:Syn良子 出处:https://www.cnblogs.com/cssdongl/p/9715735.html 转载请注明出处

    Druid的数据采集格式


    Druid可以采集非标准化的数据诸如JSON,CSV或者以某种分隔符隔开的TSV格式,当然还支持自定义格式.虽然大部分的文档使用JSON格式,但是通过druid来配置支持其他的限定格式也不是很难.

    当前支持的格式化数据


    1. 列表项

    JSON

    {"timestamp": "2013-08-31T01:02:33Z", "page": "Gypsy Danger", "language" : "en", "user" : "nuclear", "unpatrolled" : "true", "newPage" : "true", "robot": "false", "anonymous": "false", "namespace":"article", "continent":"North America", "country":"United States", "region":"Bay Area", "city":"San Francisco", "added": 57, "deleted": 200, "delta": -143}
    

    CSV

    2013-08-31T01:02:33Z,"Gypsy Danger","en","nuclear","true","true","false","false","article","North America","United States","Bay Area","San Francisco",57,200,-143
    

    TSV

    2013-08-31T01:02:33Z    "Gypsy Danger"  "en"    "nuclear"   "true"  "true"  "false" "false" "article"   "North America" "United States" "Bay Area"  "San Francisco" 57  200 -143
    

    需要注意的是CSV,TSV不能包含列头,这点在数据采集的时候一定要注意

    自定义格式


    Druid支持使用正则解析和JavaScript来自定义数据格式.但是这种方式并没有自己实现的Java解析器或者额外的流式处理工具效率更高.

    配置数据采集的schema


    什么是data schema?其实就是Druid的index数据摄取任务需要的数据源的描述的元数据.它主要描述要采集的数据类型,数据由哪些列构成,哪些是指标列,哪些是维度列,时间的粒度等.

    以CSV格式举例

    "parseSpec": {
    "format" : "csv",
    "timestampSpec" : {
      "column" : "timestamp"
    },
    "columns" : ["timestamp","page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city","added","deleted","delta"],
    "dimensionsSpec" : {
      "dimensions" : ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"]
    }}
    

    parseSpec指明了数据源格式,这里是format中表明是CSV格式,然后说明时间戳字段名是timestamp,数据字段名是columns里面那一堆,dimensionsSpec则代表哪些字段可以作为维度.

    参考资料:Druid的数据格式

  • 相关阅读:
    其实 Linux IO 模型没那么难
    七年三次大重构,聊聊我的重构成长史
    听说 JVM 性能优化很难?今天我小试了一把!
    盘点三年来写过的原创文章
    如何快速实现一个连接池?
    树结构系列(四):MongoDb 使用的到底是 B 树,还是 B+ 树?
    树结构系列(三):B树、B+树
    树结构系列(二):平衡二叉树、AVL树、红黑树
    树结构系列(一):从普通树到二叉查找树
    静态代码分析工具清单
  • 原文地址:https://www.cnblogs.com/cssdongl/p/9715735.html
Copyright © 2011-2022 走看看