zoukankan      html  css  js  c++  java
  • 在archlinux上搭建twitter storm cluster

    本文详细描述如何在archlinux上搭建twitter storm cluster,转载请注明出处,谢谢。

    有关archlinux基本系统安装,请参照archlinux简明安装指南一文,下面以上述为基础讲解如何一步步安装twitter storm cluster.

    先列出安装主要步骤

    1. 安装oracle jdk
    2. 安装必须的编译工具gcc, g++, make
    3. 安装python2.7, unzip
    4. 编译安装zeromq
    5. 编译安装jzmq
    6. 下载lein
    7. 下载storm-starter
    8. 下载storm release版本
    9. 安装zookeeper为了自动运行storm cluster,安装supervisord

    安装oracle jdk

    在linux平台上标配的java是openjdk,如果要安装oracle的jdk的话,需要从官方下载相应的安装包。使用archlinux幸福的一点就是有yaourt,一切可以变得非常简单,:).

    #yaourt -S jdk

    注意安装完的java路径,应该是在/opt/java, 这个后面会用到。

    修改/etc/profile, 添加环境变量JAVA_HOME,为PATH添加/opt/java/bin

    PATH="/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/java/bin"
    export PATH
    export JAVA_HOME="/opt/java"

    安装编译工具

    在twitter storm中会使用zeromq,因为zeromq是用c&c++编写的,所以需要安装相应的编译工具,不要使用archlinux中的版本,因为目前pacman或aur中的zeromq版本是3.x,而twitter storm中需要的zeromq是2.1.7

    #pacman -S gcc g++ libtool pkg-config make autoconf git util-linux

    安装python2.7, unzip

    #pacman -S python2 unzip 

    编译安装zeromq,jzmq

    从 http://download.zeromq.org/zeromq-2.1.7.tar.gz下载zeromq 2.1.7

    #tar zvxf zeromq-2.1.7.tar.gz
    #config
    #make
    #make install

    安装的路径是/usr/local/lib

    编译安装jzmq

    #git clone https://github.com/nathanmarz/jzmq.git
    #cd jzmq
    #./autogen.sh
    #./configure --with-zeromq=/usr/local
    #make 注意,此处可能会出错,解决办法是修改jzmq/src/Makefile.am,将classdist_noinst.stamp修改为classnoinst.stamp
    #make install

    安装完zeromq和jzmq之后,修改/etc/ld.so.conf,在该文件中添加如下一行

    /usr/local/lib

    然后运行

    #ldconfig 

    为了验证libjzmq确实使用的zeromq是自行编译的版本,可使用如下命令进行检测。

    #ldd /usr/local/lib/libjzmq.so
    linux-gate.so.1 (0xb779e000)
        libzmq.so.1 => /usr/local/lib/libzmq.so.1 (0xb7749000)
        libuuid.so.1 => /usr/lib/libuuid.so.1 (0xb7743000)
        librt.so.1 => /usr/lib/librt.so.1 (0xb773a000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0xb771e000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7635000)
        libm.so.6 => /usr/lib/libm.so.6 (0xb75ee000)
        libc.so.6 => /usr/lib/libc.so.6 (0xb743e000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0xb7422000)
        /usr/lib/ld-linux.so.2 (0xb779f000)

    如果libzmq.so.1确实指向/usr/local/lib中的版本,则说明版本使用正确。

    安装storm-starter

    storm-starter是由storm的作者为了storm的初学者快速上手而创建的一个github项目。

    #git clone https://github.com/nathanmarz/storm-starter.git

    编译运行, 注意这是运行在local模式而非常cluster模式

    #lein deps
    #lein compile
    #java -cp $(lein classpath) storm.starter.ExclamationTopology

    注:

        直接从http://leiningen.org/下载lein script,而非直接使用pacman或yaourt来安装 

    #chmod +x ./lein
    #cp ./lein /usr/local/bin
    #export LEIN_ROOT=1 如果想以root来运行lein,需要设置该变量  

    安装zookeeper

    #yaourt -S zookeeper

    作简单的配置,修改文件/etc/zookeeper/zoo.cfg,使其内容如下所示

    #The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial 
    # synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between 
    # sending a request and getting an acknowledgement
    syncLimit=5
    # the directory where the snapshot is stored.
    # do not use /tmp for storage, /tmp here is just 
    # example sakes.
    dataDir=/var/lib/zookeeper
    # the port at which the clients will connect
    clientPort=2181

    因为zookeeper只对IPv6地址进行监听,为了强制其只监听IPv4地址,需要修改/opt/zookeeper-3.4.5/bin/zkServer.sh,在start)一节中加入 "-Djava.net.preferIPv4Stack=true", 整体看起来如下所示

    case $1 in
    start)
        echo  -n "Starting zookeeper ... "
        if [ -f $ZOOPIDFILE ]; then
          if kill -0 `cat $ZOOPIDFILE` > /dev/null 2>&1; then
             echo $command already running as process `cat $ZOOPIDFILE`. 
             exit 0
          fi  
        fi  
        nohup $JAVA "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" 
          "-Djava.net.preferIPv4Stack=true" 
        -cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null &
        if [ $? -eq 0 ] 
        then
          if /bin/echo -n $! > "$ZOOPIDFILE"
          then
            sleep 1
            echo STARTED
          else
            echo FAILED TO WRITE PID
            exit 1
          fi  
        else
          echo SERVER DID NOT START
          exit 1
        fi  
        ;;  

    注意蓝底红字的一行。

    启动zookeeper

    #/opt/zookeeper-3.4.5/bin/zkServer.sh start 

     下载安装storm

    从storm-project.net下载storm-0.8.2,将其解压到/opt目录下

    #unzip storm-0.8.2.zip

    修改/opt/storm-0.8.2/conf/storm.yaml, 文件内容如下

    ########### These MUST be filled in for a storm configuration
     storm.zookeeper.servers:
         - "localhost"
    #     - "server2"
    # 
     nimbus.host: "localhost"
    # 
    # 
    # ##### These may optionally be filled in:
    #    
    ## List of custom serializations
    # topology.kryo.register:
    #     - org.mycompany.MyType
    #     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
    #
    ## List of custom kryo decorators
    # topology.kryo.decorators:
    #     - org.mycompany.MyDecorator
    #
    ## Locations of the drpc servers
    # drpc.servers:
    #     - "server1"
    #     - "server2"
    
    ## Metrics Consumers
    # topology.metrics.consumer.register:
    #   - class: "backtype.storm.metrics.LoggingMetricsConsumer"
    #     parallelism.hint: 1
    #   - class: "org.mycompany.MyMetricsConsumer"
    #     parallelism.hint: 1
    #     argument:
    #       - endpoint: "metrics-collector.mycompany.org"
     java.library.path: "/usr/local/lib:/usr/local/share/java"
     supervisor.slots.ports:
       - 6700
       - 6701

    注意:

      yaml要求配置项必须以空格打头

    修改storm脚本,将#!/usr/bin/python改为#!/usr/bin/python2, /usr/bin/python是指向python3的所以需要显示将其改为python2

    准备运行cluster模式了

    #/opt/storm-0.8.2/bin/storm nimbus
    #/opt/storm-0.8.2/bin/storm supervisor
    #/opt/storm-0.8.2/bin/storm ui

    上述每条指令需要单独运行在一个终端,如果ui启动成功,可以使用浏览器来访问localhost:8080查看整个cluster的状况了。

    部署Topology到cluster

    #./storm jar $HOME/working/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-standalone.jar storm.starter.ExclamationTopology exclamationTopology

    一切顺利的话,应该可以看到类似的输出

    0    [main] INFO  backtype.storm.StormSubmitter  - Jar not uploaded to master yet. Submitting jar...
    91   [main] INFO  backtype.storm.StormSubmitter  - Uploading topology jar /root/working/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-standalone.jar to assigned location: storm-local/nimbus/inbox/stormjar-c73d28f0-68fc-4e6e-98b5-c4d1355aa94f.jar
    667  [main] INFO  backtype.storm.StormSubmitter  - Successfully uploaded topology jar to assigned location: storm-local/nimbus/inbox/stormjar-c73d28f0-68fc-4e6e-98b5-c4d1355aa94f.jar
    670  [main] INFO  backtype.storm.StormSubmitter  - Submitting topology exclamationTopology in distributed mode with conf {"topology.workers":3,"topology.debug":true}
    2449 [main] INFO  backtype.storm.StormSubmitter  - Finished submitting topology: exclamationTopology

    自动化运行storm cluster

    每次都要手工启动storm cluster并不是一件很令人愉快的事,最好是能自动启动。解决办法总是有的,使用python supervisor即可。

    #pacman -S supervisor
    #mkdir -p /var/log/storm

    修改supervisor配置文件,在文件最后添加如下内容

    [program:storm-nimbus]
    environment=JAVA_HOME=/opt/java, PATH="/usr/sbin:/usr/bin:/usr/local/bin:/opt/java/bin"
    command=/opt/storm-0.8.2/bin/storm nimbus
    ;;user=storm
    autostart=true
    autorestart=true
    startsecs=10
    startretries=999
    log_stdout=true
    log_stderr=true
    logfile=/var/log/storm/nimbus.out
    logfile_maxbytes=20MB
    logfile_backups=10
    
    [program:storm-supervisor]
    environment=JAVA_HOME=/opt/java, PATH="/usr/sbin:/usr/bin:/usr/local/bin:/opt/java/bin"
    command=/opt/storm-0.8.2/bin/storm supervisor
    ;;user=storm
    autostart=true
    autorestart=true
    startsecs=10
    startretries=999
    log_stdout=true
    log_stderr=true
    logfile=/var/log/storm/supervisor.out
    logfile_maxbytes=20MB
    logfile_backups=10

    注:

        在上述配置中显示加入了environment一行,主要是为了解决可执行文件搜索路径的问题,否则会报错说无法找到java可执行程序因其不在标准路径/usr/bin, /usr/sbin, /usr/local/bin, /usr/local/sbin中。

    启动supervisord

    #systemctl start supervisord

    想开机自动运行supervisord的话,执行如下指令

    #systemctl enable supervisord

     参考资料

    1. Running a Multi-Node Storm Cluster http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/

  • 相关阅读:
    K2 BPM介绍(2)
    K2 BPM介绍(1)
    认识BPM
    使用VS Code发布博客
    IIS 使用 HTTP/2
    IIS 8的第一次请求不变慢如何配置
    C# 图片识别技术(支持21种语言,提取图片中的文字)
    第九讲 C#练习题
    c#基础 第八讲
    c#基础 第六讲
  • 原文地址:https://www.cnblogs.com/hseagle/p/3373701.html
Copyright © 2011-2022 走看看