zoukankan      html  css  js  c++  java
  • Apache Druid0.15.0安装方式

    Druid0.15.0概述

    Druid是一个用于大数据实时查询和分析的高容错、高性能开源分布式系统,旨在快速处理大规模的数据,并能够实现快速查询和分析。尤其是当发生代码部署、机器故障以及其他产品系统遇到宕机等情况时,Druid仍能够保持100%正常运行。创建Druid的最初意图主要是为了解决查询延迟问题,Druid提供了以交互方式访问数据的能力,并权衡了查询的灵活性和性能而采取了特殊的存储格式。值得一提的是,Druid0.15开始支持SQL查询,而在之前的版本是不支持SQL查询的,只有json才能查询。

    特性

    • 为局部嵌套数据结构提供列式存储格式;
    • 为快速过滤做索引;
    • 实时摄取和查询;
    • 高容错的分布式体系架构等。

    业务场景

    1. 需要交互式聚合和快速探究大量数据时;
    2. 需要实时查询分析时;
    3. 对数据尤其是大数据进行实时分析时,在溢米大数据应用场景中,以上三个特性和天眼五期需求非常契合,而Druid恰好可与悟空结合实现实时入库。目前Spark+CarbonData的方式随着数据量的增加,查询速度变得缓慢,Druid是一个不错的替代方案;
    4. 需要一个高可用、高容错、高性能数据库时。

    1 集群规划

    • Master包含Coordinator和Overlord,4核16G*2;
    • data包含Historical和MiddleManager,16核64G*3;
    • query包含Broker和Router,4核16G*1。
    1.1 Hadoop配置文件设置

    本次安装使用HDFS作为存储,进入3个data节点,/data1/druid/druid-0.15.0/conf/druid/cluster/_common目录,软链到对应hadoop的配置文件目录,此步骤为了识别Hadoop HA模式,否则深度存储使用HDFS无法识别路径。

    ln -s /usr/hdp/2.6.5.0-292/hadoop/conf hadoop-xml
    
    1.2 jdk1.8安装,此处省略。
    1.3 data节点作为HDFS的datanode,此处省略
    1.4 common配置

    这个配置可以打印druid系统的运行日志,方便后续定位问题,文件路径和文件名可修改

    1. log4j2.xml配置
    <Configuration status="WARN">
        <Properties>
            <Property name="log.path">/data1/druid/log</Property>
        </Properties>
        <Appenders>
            <Console name="Console" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
            </Console>
            <File name="log" fileName="${log.path}/one.log" append="false">
                <PatternLayout pattern="[%d{yyyy-MM-dd HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
            </File>
            <RollingFile name="RollingFileInfo" fileName="${log.path}/druid-data.log"
                         filePattern="${log.path}/druid-data-%d{yyyy-MM-dd}-%i.out">
                <ThresholdFilter level="info" onMatch="ACCEPT" onMismatch="DENY"/>
                <PatternLayout pattern="[%d{yyyy-MM-dd HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
                <Policies>
                    <TimeBasedTriggeringPolicy modulate="true" interval="1"/>
                    <SizeBasedTriggeringPolicy size="100 MB"/>
                </Policies>
    
            </RollingFile>
        </Appenders>
        <Loggers>
            <Root level="info">
                <AppenderRef ref="Console"/>
                <appender-ref ref="RollingFileInfo"/>
                <appender-ref ref="log"/>
            </Root>
        </Loggers>
    </Configuration>
    1. common.runtime.properties配置, druid.host改成druid所在机器的hostname,这个配置文件是全局的配置文件,对应的参数有相应的解释。
    druid.extensions.loadList=["druid-kafka-eight", "druid-histogram", "druid-datasketches", "mysql-metadata-storage","druid-hdfs-storage","druid-kafka-extraction-namespace","druid-kafka-indexing-service"]
    druid.extensions.directory=/data1/druid/druid-0.15.0/extensions
    # If you have a different version of Hadoop, place your Hadoop client jar files in your hadoop-dependencies directory
    # and uncomment the line below to point to your directory.
    druid.extensions.hadoopDependenciesDir=/data1/druid/druid-0.15.0/hadoop-dependencies
    
    
    #
    # Hostname
    #
    druid.host=bd-prod-slave06
    #
    # Logging
    # Log all runtime properties on startup. Disable to avoid logging properties on startup:
    druid.startup.logging.logProperties=true
    
    #
    # Zookeeper
    #
    
    druid.zk.service.host=bd-prod-master01:2181,bd-prod-master02:2181,bd-prod-slave01:2181
    druid.zk.paths.base=/druid
    
    #
    # Metadata storage
    #
    
    # For Derby server on your Druid Coordinator (only viable in a cluster with a single Coordinator, no fail-over):
    # druid.metadata.storage.type=derby
    # druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
    # druid.metadata.storage.connector.host=localhost
    # druid.metadata.storage.connector.port=1527
    
    # For MySQL (make sure to include the MySQL JDBC driver on the classpath):
    druid.metadata.storage.type=mysql
    druid.metadata.storage.connector.connectURI=jdbc:mysql://bd-prod-master01:3306/druid?useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8
    druid.metadata.storage.connector.user=user
    druid.metadata.storage.connector.password=password
    
    # For PostgreSQL:
    #druid.metadata.storage.type=postgresql
    #druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid
    #druid.metadata.storage.connector.user=...
    #druid.metadata.storage.connector.password=...
    
    #
    # Deep storage
    #
    
    # For local disk (only viable in a cluster if this is a network mount):
    # druid.storage.type=local
    # druid.storage.storageDirectory=var/druid/segments
    
    # For HDFS:
    druid.storage.type=hdfs
    druid.storage.storageDirectory=hdfs://bd-prod/druid/segments
    
    # For S3:
    #druid.storage.type=s3
    #druid.storage.bucket=your-bucket
    #druid.storage.baseKey=druid/segments
    #druid.s3.accessKey=...
    #druid.s3.secretKey=...
    
    #
    # Indexing service logs
    #
    
    # For local disk (only viable in a cluster if this is a network mount):
    # druid.indexer.logs.type=file
    # druid.indexer.logs.directory=var/druid/indexing-logs
    
    # For HDFS:
    druid.indexer.logs.type=hdfs
    druid.indexer.logs.directory=hdfs://bd-prod/druid/indexing-logs
    
    # For S3:
    #druid.indexer.logs.type=s3
    #druid.indexer.logs.s3Bucket=your-bucket
    #druid.indexer.logs.s3Prefix=druid/indexing-logs
    
    #
    # Service discovery
    #
    
    druid.selectors.indexing.serviceName=druid/overlord
    druid.selectors.coordinator.serviceName=druid/coordinator
    
    #
    # Monitoring
    #
    
    druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
    druid.emitter=noop
    druid.emitter.logging.logLevel=info
    
    # Storage type of double columns
    # ommiting this will lead to index double as float at the storage layer
    
    druid.indexing.doubleStorage=double
    
    #
    # Security
    #
    druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"]
    
    
    #
    # SQL
    #
    druid.sql.enable=true
    
    #
    # Lookups
    #
    druid.lookup.enableLookupSyncOnStartup=false
    2.data节点

    进入data节点,修改相应的druid.host;

    2.1 historical

    historical主要负责加载已经生成好的数据文件以提供数据查询。

    1. /data1/druid/druid-0.15.0/conf/druid/cluster/data/historical/jvm.config
    -server
    -Xms8g
    -Xmx8g
    -XX:MaxDirectMemorySize=12g
    -XX:+ExitOnOutOfMemoryError
    -Duser.timezone=UTC+0800
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    1. /data1/druid/druid-0.15.0/conf/druid/cluster/data/historical/runtime.properties
    druid.service=druid/historical
    druid.plaintextPort=9088
    druid.segmentCache.numLoadingThreads=16
    # HTTP server threads
    druid.server.http.numThreads=60
    
    # Processing threads and buffers
    druid.processing.buffer.sizeBytes=500000000
    druid.processing.numMergeBuffers=4
    druid.processing.numThreads=16
    druid.processing.tmpDir=/data1/druid/processing
    
    # Segment storage
    druid.segmentCache.locations=[{"path":"/data1/druid/segment-cache","maxSize":300000000000}]
    druid.server.maxSize=300000000000
    
    # Query cache
    druid.historical.cache.useCache=true
    druid.historical.cache.populateCache=true
    druid.cache.type=caffeine
    druid.cache.sizeInBytes=256000000
    2.2 middleManager

    middleManager主要负责索引服务的工作节点,负责接收Coordinator分配的任务,然后启动容器完成具体任务。

    1. /data1/druid/druid-0.15.0/conf/druid/cluster/data/middleManager/jvm.config
    -server
    -Xms128m
    -Xmx128m
    -XX:+ExitOnOutOfMemoryError
    -Duser.timezone=UTC+0800
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    1. /data1/druid/druid-0.15.0/conf/druid/cluster/data/middleManager/runtime.properties
    druid.service=druid/middleManager
    druid.plaintextPort=8091
    
    # Number of tasks per middleManager
    druid.worker.capacity=4
    
    # Task launch parameters
    druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC+0800 -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    druid.indexer.task.baseTaskDir=/data1/druid/task
    
    # HTTP server threads
    druid.server.http.numThreads=60
    
    # Processing threads and buffers on Peons
    druid.indexer.fork.property.druid.processing.numMergeBuffers=2
    druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
    druid.indexer.fork.property.druid.processing.numThreads=4
    
    # Hadoop indexing
    druid.indexer.task.hadoopWorkingPath=/data1/druid/hadoop-tmp
    2.3 启动命令
     nohup ./bin/start-cluster-data-server >/dev/null 2>&1 &
    

    3 master节点

    进入master节点,修改common的druid.host选项;

    3.1 coordinator-overlord

    负责Historical节点的数据负载均衡,以及通过规则管理数据生命周期,也是索引服务的主节点,对外负责接收任务请求,对内负责将任务分解并下发到从节点即MiddleManager上。

    1. /data1/druid/druid-0.15.0/conf/druid/cluster/master/coordinator-overlord/jvm.config
    -server
    -Xms12g
    -Xmx12g
    -XX:+ExitOnOutOfMemoryError
    -XX:+UseG1GC
    -Duser.timezone=UTC+0800
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    -Dderby.stream.error.file=/data1/druid/derby.log
    1. /data1/druid/druid-0.15.0/conf/druid/cluster/master/coordinator-overlord/runtime.properties
    druid.service=druid/coordinator
    druid.plaintextPort=9181
    
    druid.coordinator.startDelay=PT10S
    druid.coordinator.period=PT5S
    
    # Run the overlord service in the coordinator process
    druid.coordinator.asOverlord.enabled=true
    druid.coordinator.asOverlord.overlordService=druid/overlord
    
    druid.indexer.queue.startDelay=PT5S
    
    druid.indexer.runner.type=remote
    druid.indexer.storage.type=metadata
    3.2 启动命令
     nohup ./bin/start-cluster-master-no-zk-server >/dev/null 2>&1 &
    

    4 query节点

    进入query节点,修改common的druid.host选项;

    4.1 broker

    broker主要对外提供数据查询服务,查询数据时,读取zookeeper上的元数据和Router,并合并查询结果数据。

    1. /data1/druid/druid-0.15.0/conf/druid/cluster/query/broker/jvm.config
    -server
    -Xms12g
    -Xmx12g
    -XX:MaxDirectMemorySize=6g
    -XX:+ExitOnOutOfMemoryError
    -Duser.timezone=UTC+0800
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    1. /data1/druid/druid-0.15.0/conf/druid/cluster/query/broker/runtime.properties
    druid.service=druid/broker
    druid.plaintextPort=8182
    
    # HTTP server settings
    druid.server.http.numThreads=60
    
    # HTTP client settings
    druid.broker.http.numConnections=50
    druid.broker.http.maxQueuedBytes=10000000
    
    # Processing threads and buffers
    druid.processing.buffer.sizeBytes=500000000
    druid.processing.numMergeBuffers=6
    druid.processing.numThreads=1
    druid.processing.tmpDir=/data1/druid/processing
    
    # Query cache disabled -- push down caching and merging instead
    druid.broker.cache.useCache=true
    druid.broker.cache.populateCache=true
    4.2 router

    router顾名思义,主要是按照规则将查询路由到各个Broker上。

    1. /data1/druid/druid-0.15.0/conf/druid/cluster/query/router/jvm.config
    -server
    -Xms1g
    -Xmx1g
    -XX:+UseG1GC
    -XX:MaxDirectMemorySize=256m
    -XX:+ExitOnOutOfMemoryError
    -Duser.timezone=UTC+0800
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=/tmp
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    1. /data1/druid/druid-0.15.0/conf/druid/cluster/query/router/runtime.properties
    druid.service=druid/router
    druid.plaintextPort=8888
    
    # HTTP proxy
    druid.router.http.numConnections=50
    druid.router.http.readTimeout=PT5M
    druid.router.http.numMaxThreads=100
    druid.server.http.numThreads=100
    
    # Service discovery
    druid.router.defaultBrokerServiceName=druid/broker
    druid.router.coordinatorServiceName=druid/coordinator
    
    # Management proxy to coordinator / overlord: required for unified web console.
    druid.router.managementProxy.enabled=true
    4.3 启动命令
    nohup ./bin/start-cluster-query-server >/dev/null 2>&1 &

    5 总结

    Druid作为OLAP的新秀,在实时入库和预聚合上表现非常优秀,而且可以和Flink结合,作为flink的下游数据存储点,是一个非常不错的选择,而且新版的特性开始支持SQL,相信在未来一定能得到大力推广,下一期写一下有关Druid的实时入库操作。



  • 相关阅读:
    【深度学习】吴恩达网易公开课练习(class1 week2)
    【深度学习】吴恩达网易公开课练习(class1 week3)
    【python】内存调试
    【python】threadpool的内存占用问题
    Druid: A Real-time Analytical Data Store
    Mesa: GeoReplicated, Near RealTime, Scalable Data Warehousing
    Presto: SQL on Everything
    The Snowflake Elastic Data Warehouse
    Guava 库
    Java Annotation
  • 原文地址:https://www.cnblogs.com/ChouYarn/p/11282909.html
Copyright © 2011-2022 走看看