zoukankan      html  css  js  c++  java
  • storm学习-基本概念及入门示例

    Components of a Storm cluster

    Storm cluster

    nimbus:   

    a daemon runs on master node, responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.

    zookeeper: 

    All coordination between Nimbus and the Supervisors is done through a Zookeeper cluster.

    the Nimbus daemon and Supervisor daemons are fail-fast and stateless; 

    all state is kept in Zookeeper or on local disk. This means you can kill -9 Nimbus or the Supervisors and they'll start back up like nothing happened. This design leads to Storm clusters being incredibly stable.

    supervisor:

    a daemon runs on worker nodes,The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. 

    Each worker process executes a subset of a topology; a running topology consists of many worker processes spread across many machines.

    Streams:

    an unbounded sequence of tuples. Storm provides the primitives for transforming a stream into a new stream in a distributed and reliable way.

    spouts:a source of streams. 

    bolts:consumes any number of input streams, does some processing, and possibly emits new streams. Complex stream transformations, like computing a stream of trending topics from a stream of tweets, require multiple steps and thus multiple bolts. Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.

    topology: is a graph of computation. 

    Each node in a topology contains processing logic, and links between nodes indicate how data should be passed around between nodes.

    the top-level abstraction that you submit to Storm clusters for execution.

    A topology is a graph of stream transformations where each node is a spout or bolt. 

    Edges in the graph indicate which bolts are subscribing to which streams. When a spout or bolt emits a tuple to a stream, it sends thetuple to every bolt that subscribed to that stream.

    Each node in a Storm topology executes in parallel. In your topology, you can specify how much parallelism you want for each node, and then Storm will spawn that number of threads across the cluster to do the execution.

    A topology runs forever, or until you kill it. Storm will automatically reassign any failed tasks. Additionally, Storm guarantees that there will be no data loss, even if machines go down and messages are dropped.

    tuple:

    data model. A tuple is a named list of values, and a field in a tuple can be an object of any type. 

    Storm supports all the primitive types, strings, and byte arrays as tuple field values. 

    To use an object of another type, you just need to implement a serializer for the type.

    Every node in a topology must declare the output fields for the tuples it emits.

    A simple topology

    TopologyBuilder builder = new TopologyBuilder();  builder.setSpout("words", new TestWordSpout(), 10);builder.setBolt("exclaim1", new ExclamationBolt(), 3).shuffleGrouping("words");builder.setBolt("exclaim2", new ExclamationBolt(), 2).shuffleGrouping("exclaim1");

    If you wanted component "exclaim2" to read all the tuples emitted by both component "words" and component "exclaim1", you would write component "exclaim2"'s definition like this:

    builder.setBolt("exclaim2", new ExclamationBolt(), 5).shuffleGrouping("words").shuffleGrouping("exclaim1");

    run-in-local-mode

    Config conf = new Config();
    conf.setDebug(true);
    conf.setNumWorkers(2);
    LocalCluster cluster = new LocalCluster();
    cluster.submitTopology("test", conf, builder.createTopology());
    Utils.sleep(10000);
    cluster.killTopology("test");
    cluster.shutdown();

    run-in-production-mode

    Storm实战之WordCount 参考 链接

  • 相关阅读:
    随机六位数验证码生成
    泛型反反射方法显示
    前台分页控件用法
    asp.net api后台
    项目开发基础概念
    认证Authentication、权限Permissions、限流Throttling、过滤Filtering、排序、分页Pagination、异常处理Exceptions、自动生成接口文档、Xadmin
    视图基类、视图扩展类、GenericAPIView的视图子类、视图集基类ViewSet、action属性、路由Routers
    序列化组件的使用、反序列化、全局钩子和局部钩子的使用、raise_exception参数、modelserializer进行数据保存时的问题
    day61 Linux:权限管理、rpm软件包管理、yum工具
    day60 Linux压缩与打包、用户管理、用户提权sudo、grep、sed、awk、sort、uniq
  • 原文地址:https://www.cnblogs.com/coding-now/p/14660614.html
Copyright © 2011-2022 走看看