Twitter Storm号称是'实时版本的Hadoop',正好团队在产品中要用,折腾了一下,en,是真折腾;没有Java背景,时间有一些浪费在Java相关的环境和项目维护上.
Storm简介
Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm-Project 官网 : http://storm-project.net/
目标
"Getting Started with Storm"这本2012年8月的小册子(105页)来得真的是恰是时候.我的目标就是搭建集群,将 "Getting Started with Storm"中的real-life-app修改为集群模式,即 Local Mode --> Remote Mode;
排错过程
按照官方网站上的部署指南,逐步去做,否则很容易找不到配置文件,出现下面的错误:
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException at backtype.storm.StormSubmitter.submitJar(StormSubmitter.java:98) at backtype.storm.StormSubmitter.submitJar(StormSubmitter.java:77) at backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:56) at storm.analytics.TopologyStarter.main(TopologyStarter.java:47) Caused by: java.lang.NullPointerException at java.io.FileInputStream.<init>(FileInputStream.java:134) at java.io.FileInputStream.<init>(FileInputStream.java:97) at backtype.storm.utils.BufferFileInputStream.<init>(BufferFileInputStream.java:14) at backtype.storm.utils.BufferFileInputStream.<init>(BufferFileInputStream.java:19) at backtype.storm.StormSubmitter.submitJar(StormSubmitter.java:88) ... 3 more
编写配置的时候需要注意:storm配置文件对空格比较敏感.使用上面的脚本,安装部署的难度已经大大降低了.
Exception in thread "main" java.lang.IllegalArgumentException: Nimbus host is not set
Storm新手能避开这个错误的不多吧
我是在运行real-life example的时候遇到这个问题,我修改了一下代码,把Local Mode改为Remote Mode时运行遇到这个问题
java -jar target/storm-analytics-0.0.1-jar-with-dependencies.jar
往Storm集群中提交Topology的时候需要使用storm client tool https://github.com/nathanmarz/storm/wiki/Command-line-client
没有多想把命令修改为 storm jar storm-analytics-0.0.1-jar-with-dependencies.jar
Traceback (most recent call last): File "/data/storm_nimbus_1/storm/storm-0.8.1/bin/storm", line 402, in <module> main() File "/data/storm_nimbus_1/storm/storm-0.8.1/bin/storm", line 399, in main (COMMANDS.get(COMMAND, "help"))(*ARGS) TypeError: jar() takes at least 2 arguments (1 given)
被羞辱了,storm脚本执行的时候需要指定入口类:
storm jar storm-analytics-0.0.1-jar-with-dependencies.jar storm.analytics.TopologyStarter
紧接着遇到了" Multiple defaults.yaml found "的错误,官网上Troubleshooting分析的原因是打包的时候把storm.jar也打进去了,可是我并没有这么干,但是可以判断的是肯定是配置加载了两遍;排除环境配置的因素,看了一下测试代码,原来多了下面两行:
conf.put(Config.NIMBUS_HOST, "192.168.10.101"); conf.put(Config.NIMBUS_THRIFT_PORT, 6627);
修改完代码之后,运行出现java.lang.NoSuchMethodError,这个在 Troubleshooting 上已经有答案了:storm版本不一致造成的;为什么会版本不一致?这是因为我直接使用github上clone的代码,pom.xml里面配置的还是0.7.1的版本,这个为了避免节外生枝,我直接修改为本地引用了,pom.xml片段如下:
<!--dependency> <groupId>storm</groupId> <artifactId>storm</artifactId> <version>0.8.1</version> </dependency--> <dependency> <groupId>storm</groupId> <artifactId>storm</artifactId> <version>0.8.1</version> <scope>system</scope> <systemPath>/data/storm_nimbus_1/storm/storm-0.8.1/storm- 0.8.1.jar</systemPath> </dependency>
修改完上面的配置,执行mvn package重新编译打包,已经可以成功提交Topology到Storm集群了;storm脚本是用python写的,完成了繁杂的依赖库拼接过程,下面是它的运行成功的结果输出:
[root@localhost target]# storm jar storm-analytics-0.0.1-jar-with-dependencies.jar storm.analytics.TopologyStarter Running: java -client -Dstorm.options= -Dstorm.home=/data/storm_nimbus_1/storm/storm-0.8.1 -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -cp /data/storm_nimbus_1/storm/storm-0.8.1/storm-0.8.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/slf4j-api-1.5.8.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/core.incubator-0.1.0.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/zookeeper-3.3.3.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/commons-fileupload-1.2.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/slf4j-log4j12-1.5.8.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/commons-io-1.4.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/commons-codec-1.4.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/joda-time-2.0.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/asm-4.0.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/tools.logging-0.2.3.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/jzmq-2.1.0.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/clj-time-0.4.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/snakeyaml-1.9.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/curator-framework-1.0.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/jetty-6.1.26.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/clout-0.4.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/commons-lang-2.5.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/junit-3.8.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/httpclient-4.1.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/httpcore-4.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/log4j-1.2.16.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/objenesis-1.2.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/jetty-util-6.1.26.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/disruptor-2.10.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/minlog-1.2.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/servlet-api-2.5-20081211.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/jgrapht-0.8.3.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/libthrift7-0.7.0.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/carbonite-1.5.0.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/clojure-1.4.0.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/json-simple-1.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/servlet-api-2.5.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/commons-logging-1.1.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/math.numeric-tower-0.0.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/jline-0.9.94.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/guava-13.0.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/ring-servlet-0.3.11.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/tools.cli-0.2.2.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/kryo-2.17.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/ring-core-0.3.10.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/hiccup-0.3.6.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/commons-exec-1.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/compojure-0.6.4.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/tools.macro-0.1.0.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/curator-client-1.0.1.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/reflectasm-1.07-shaded.jar:/data/storm_nimbus_1/storm/storm-0.8.1/lib/ring-jetty-adapter-0.3.11.jar:storm-analytics-0.0.1-jar-with-dependencies.jar:/root/.storm:/data/storm_nimbus_1/storm/storm-0.8.1/bin -Dstorm.jar=storm-analytics-0.0.1-jar-with-dependencies.jar storm.analytics.TopologyStarter log4j:WARN No appenders could be found for logger (backtype.storm.StormSubmitter). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
可能运行起来之后,你会发现还是不太正常,这里有一个陷阱,代码里面链接到redis和node.js搭建的web server都是使用的localhost,要知道我们分发代码到集群中,别的机器就运行不正常了.所以这里要修改为Redis和node.js所在的服务器IP:
public final static String REDIS_HOST = "192.168.10.101"; public final static String WEBSERVER = "http://192.168.10.101:3000/news";
运行后排错
一旦你能提交Topology到Storm,除了日志以外,运行时的错误可以在Storm UI的web界面看到Last Error,这是一个便捷的方法.
下面是我重新灌入电影数据之后出来的运行结果:
最后,小图一张,放松一下: