zoukankan      html  css  js  c++  java
  • Storm 多语言支持

    Using non JVM languages with Storm

    https://github.com/nathanmarz/storm/wiki/Using-non-JVM-languages-with-Storm

    Multilang protocol

    https://github.com/nathanmarz/storm/wiki/Multilang-protocol

    Using non JVM languages with Storm

    对于JVM语言比较简单, 直接提高DSL封装Java即可
    对于非JVM语言就稍微复杂一些, Storm分为两部分, topology和component(blot和spout)

    对于topology用其他语言实现比较easy, 因为nimbus是thrift server, 所以什么语言最终都是都是转化为thrift结构. 而且其实topology本身逻辑就比较简单, 直接用java写也行, 没有太多的必要一定要使用其他的语言

    对于component, 采用的方案和Hadoop的一样, 使用shell process来执行component, 并使用stdin, stdout作为component之间的通信 (json messages over stdin/stdout)
    通信就涉及通信协议, 即每个component怎样产生别的component可以理解json message, storm的通信协议比较简单, 参考
    Multilang protocol
    当前storm, 实现python, ruby, 和fancy的版本, 如果需要支持其他的语言, 自己实现一下这个协议也应该很容易.
    其实component支持多语言比较必要, 因为很多分析或统计模块, 不一定是使用java, 如果porting比较麻烦, 不象topology那么简单.

    two pieces: creating topologies and implementing spouts and bolts in other languages

    • creating topologies in another language is easy since topologies are just thrift structures (link to storm.thrift)
    • implementing spouts and bolts in another language is called a "multilang components" or "shelling"
      • Here's a specification of the protocol: Multilang protocol
      • the thrift structure lets you define multilang components explicitly as a program and a script (e.g., python and the file implementing your bolt)
      • In Java, you override ShellBolt or ShellSpout to create multilang components
        • note that output fields declarations happens in the thrift structure, so in Java you create multilang components like the following:
          • declare fields in java, processing code in the other language by specifying it in constructor of shellbolt
      • multilang uses json messages over stdin/stdout to communicate with the subprocess
      • storm comes with ruby, python, and fancy adapters that implement the protocol. show an example of python
        • python supports emitting, anchoring, acking, and logging
    • "storm shell" command makes constructing jar and uploading to nimbus easy
      • makes jar and uploads it
      • calls your program with host/port of nimbus and the jarfile id

    Bolt可以使用任何语言来定义. 用其它语言定义的bolt会被当作子进程(subprocess)来执行, storm使用JSON消息通过stdin/stdout来和这些subprocess通信.
    这个通信协议是一个只有100行的库, storm团队给这些库开发了对应的Ruby, Python和Fancy版本.

    Python版本的Bolt的定义, 和java版不同的是继承ShellBolt类

    public static class SplitSentence extends ShellBolt implements IRichBolt {
        public SplitSentence() {
            super("python", "splitsentence.py");
        }
     
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("word"));
        }
    }
    下面是splitsentence.py的定义:
    import storm
    class SplitSentenceBolt(storm.BasicBolt):
        def process(self, tup):
            words = tup.values[0].split(" ")
            for word in words:
              storm.emit([word])
    SplitSentenceBolt().run()

    上面是使用python component的例子, 首先继承ShellBolt, 表示输入输出是通过shell stdin/stdout来完成的
    然后, 下面直接将python splitsentence.py作为子进程来调用

    在python中, 首先import storm, 其中封装了通信协议, 很简单的100行, 可以看看

    DSLs and multilang adapters

    https://github.com/nathanmarz/storm/wiki/DSLs-and-multilang-adapters

    前面说了, 对于JVM的语言, 很简单只是封装一下java, 然后提供DSL即可, 上面列出所有官方提供的DSL
    可以简单以Clojure为例子, 了解一下

    Clojure DSL

    Storm comes with a Clojure DSL for defining spouts, bolts, and topologies. The Clojure DSL has access to everything the Java API exposes, so if you're a Clojure user you can code Storm topologies without touching Java at all.

    https://github.com/nathanmarz/storm/wiki/Clojure-DSL

    Defining a non-JVM language dsl for storm

    https://github.com/nathanmarz/storm/wiki/Defining-a-non-jvm-language-dsl-for-storm

    对于non-JVM语言, 通过storm shell命令也可以实现类似dsl

    There's a "storm shell" command that will help with submitting a topology. Its usage is like this:

    storm shell resources/ python topology.py arg1 arg2

    storm shell will then package resources/ into a jar, upload the jar to Nimbus, and call your topology.py script like this:

    python topology.py arg1 arg2 {nimbus-host} {nimbus-port} {uploaded-jar-location}

    Then you can connect to Nimbus using the Thrift API and submit the topology, passing {uploaded-jar-location} into the submitTopology method. For reference, here's the submitTopology definition:

    void submitTopology(1: string name, 2: string uploadedJarLocation, 3: string jsonConf, 4: StormTopology topology) throws (1: AlreadyAliveException e, 2: InvalidTopologyException ite);
  • 相关阅读:
    LeetCode
    算法
    GitHub
    GitHub
    git
    将博客搬家至CSDN
    base64与图片互转
    windows下mongodb数据库搭建过程遇到问题
    mongodb数据插入语句与navicat导入mongodb的json结构
    Visual C++安装失败解决:Error 0x80240017: Failed to execute MSU package.
  • 原文地址:https://www.cnblogs.com/fxjwind/p/3071621.html
Copyright © 2011-2022 走看看