Streams groupings:流的分组策略
Apache Storm是一个免费的开源分布式实时计算系统。Apache Storm可以轻松可靠地处理无限数据流,实现Hadoop为批处理所做的实时处理。Apache Storm很简单,可以与任何编程语言一起使用,并且使用起来很有趣!
Apache Storm有许多用例:实时分析,在线机器学习,连续计算,分布式RPC,ETL等。Apache Storm很快:一个基准测试时钟表示每个节点每秒处理超过一百万个元组。它具有可扩展性,容错性,可确保您的数据得到处理,并且易于设置和操作。
Apache Storm与您已经使用的消息队列和数据库技术集成。Apache Storm拓扑消耗数据流并以任意复杂的方式处理这些流,然后在计算的每个阶段之间重新划分流。
一个关键的区别是:一个MapReduce job最终会结束,而一个Topology永远会存在(除非手动kill掉)
在Storm的集群里面有两种节点:控制节点(master node)和工作槽位节点(worker node,默认每台机器最多4个slots槽位).控制节点上面运行一个叫nimbus后台程序,它的作用类似于haddop里面的JobTracker。nimbus负责在集群里面分发代码,分配计算任务给机器,并且监控状态.。
- hadoop01
- hadoop02
- hadoop03
#nimbus所在的主机名 "hadoop01"
nohup ./storm nimbus 1 > /dev/bull 2>&1 &
nohup ./storm ui 1 > /dev/null 2>&1 &
nohup ./storm supervisor 1 > /dev/null 2>&1 &
[linyouyi@hadoop01 software]$ wget [linyouyi@hadoop01 software]$ ll total 739172 -rw-rw-r-- 1 linyouyi linyouyi 312465430 Apr 30 06:17 apache-storm-2.0.0.tar.gz -rw-r--r-- 1 linyouyi linyouyi 218720521 Aug 3 17:56 hadoop-2.7.7.tar.gz -rw-rw-r-- 1 linyouyi linyouyi 132569269 Mar 18 14:28 hbase-2.0.5-bin.tar.gz -rw-r--r-- 1 linyouyi linyouyi 54701720 Aug 3 17:47 server-jre-8u144-linux-x64.tar.gz -rw-r--r-- 1 linyouyi linyouyi 37676320 Aug 8 09:36 zookeeper-3.4.14.tar.gz [linyouyi@hadoop01 software]$ tar -zxvf apache-storm-2.0.0.tar.gz -C /hadoop/module/ [linyouyi@hadoop01 software]$ cd /hadoop/module/apache-storm-2.0.0 [linyouyi@hadoop01 apache-storm-2.0.0]$ ll total 308 drwxrwxr-x 2 linyouyi linyouyi 4096 Aug 12 21:11 bin drwxrwxr-x 2 linyouyi linyouyi 4096 Aug 12 21:11 conf -rw-r--r-- 1 linyouyi linyouyi 91939 Apr 30 05:13 DEPENDENCY-LICENSES drwxr-xr-x 19 linyouyi linyouyi 4096 Apr 30 05:13 examples drwxrwxr-x 19 linyouyi linyouyi 4096 Aug 12 21:11 external drwxr-xr-x 2 linyouyi linyouyi 4096 Apr 30 05:59 extlib drwxr-xr-x 2 linyouyi linyouyi 4096 Apr 30 05:59 extlib-daemon drwxrwxr-x 2 linyouyi linyouyi 4096 Aug 12 21:11 lib drwxrwxr-x 5 linyouyi linyouyi 4096 Aug 12 21:11 lib-tools drwxr-xr-x 2 linyouyi linyouyi 4096 Apr 30 05:59 lib-webapp drwxr-xr-x 2 linyouyi linyouyi 4096 Apr 30 05:58 lib-worker -rw-r--r-- 1 linyouyi linyouyi 82390 Apr 30 05:13 LICENSE drwxr-xr-x 2 linyouyi linyouyi 4096 Apr 30 05:13 licenses drwxrwxr-x 2 linyouyi linyouyi 4096 Aug 12 21:11 log4j2 -rw-r--r-- 1 linyouyi linyouyi 34065 Apr 30 05:13 NOTICE drwxrwxr-x 6 linyouyi linyouyi 4096 Aug 12 21:11 public -rw-r--r-- 1 linyouyi linyouyi 7914 Apr 30 05:13 README.markdown -rw-r--r-- 1 linyouyi linyouyi 6 Apr 30 05:13 RELEASE -rw-r--r-- 1 linyouyi linyouyi 23865 Apr 30 05:13
[linyouyi@hadoop01 apache-storm-2.0.0]$ vim conf/storm.yaml #zookeeper地址 storm.zookeeper.servers: - "hadoop01" - "hadoop02" - "hadoop03" nimbus.seeds: ["hadoop01"] #nimbus.seeds: ["host1", "host2", "host3"] [linyouyi@hadoop01 apache-storm-2.0.0]$ cd ../ [linyouyi@hadoop01 module]$ scp -r apache-storm-2.0.0 linyouyi@hadoop02:/hadoop/module/ [linyouyi@hadoop01 module]$ scp -r apache-storm-2.0.0 linyouyi@hadoop03:/hadoop/module/
[linyouyi@hadoop01 module]$ cd apache-storm-2.0.0 //如果报找不到java_home则需要配置conf/strom-env.sh文件 [linyouyi@hadoop01 apache-storm-2.0.0]$ bin/storm nimbus & [linyouyi@hadoop01 apache-storm-2.0.0]$ jps 30051 Nimbus 44057 QuorumPeerMain 30381 Jps [linyouyi@hadoop01 apache-storm-2.0.0]$ netstat -tnpl | grep 30684 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp6 0 0 :::6627 :::* LISTEN 30684/java [linyouyi@hadoop01 apache-storm-2.0.0]$ bin/storm ui & [linyouyi@hadoop01 apache-storm-2.0.0]$ jps 32674 UIServer 44057 QuorumPeerMain 30684 Nimbus 32989 Jps [linyouyi@hadoop01 apache-storm-2.0.0]$ netstat -tnpl | grep 32674 tcp6 0 0 :::8080 :::* LISTEN 32674/java //浏览器查看http://hadoop01:8080发现很多工作槽都是0,下面我们在hadoop02,hadoop03启动supervisor,工作槽就不再是0了 [linyouyi@hadoop02 apache-storm-2.0.0]$ bin/storm supervisor [linyouyi@hadoop02 apache-storm-2.0.0]$ jps 70952 Jps 70794 Supervisor 34879 QuorumPeerMain [linyouyi@hadoop03 apache-storm-2.0.0]$ bin/storm supervisor [linyouyi@hadoop03 apache-storm-2.0.0]$ jps 119587 QuorumPeerMain 116291 Jps 116143 Supervisor
//命令格式: storm jar [jar路径] [拓扑包名.拓扑类名] [stormIP地址] [storm端口] [拓扑名称] [参数] [linyouyi@hadoop01 apache-storm-2.0.0]$ bin/storm jar --help usage: storm jar [-h] [--jars JARS] [--artifacts ARTIFACTS] [--artifactRepositories ARTIFACTREPOSITORIES] [--mavenLocalRepositoryDirectory MAVENLOCALREPOSITORYDIRECTORY] [--proxyUrl PROXYURL] [--proxyUsername PROXYUSERNAME] [--proxyPassword PROXYPASSWORD] [--storm-server-classpath] [--config CONFIG] [-storm_config_opts STORM_CONFIG_OPTS] topology-jar-path topology-main-class [topology_main_args [topology_main_args ...]] positional arguments: topology-jar-path will upload the jar at topology-jar-path when the topology is submitted. topology-main-class main class of the topology jar being submitted topology_main_args Runs the main method with the specified arguments. optional arguments: --artifactRepositories ARTIFACTREPOSITORIES When you need to pull the artifacts from other than Maven Central, you can pass remote repositories to --artifactRepositories option with a comma-separated string. Repository format is "<name>^<url>". '^' is taken as separator because URL allows various characters. For example, --artifactRepositories "jboss-repository^,H DPRepo^ ic/" will add JBoss and HDP repositories for dependency resolver. --artifacts ARTIFACTS When you want to ship maven artifacts and its transitive dependencies, you can pass them to --artifacts with comma-separated string. You can also exclude some dependencies like what you're doing in maven pom. Please add exclusion artifacts with '^' separated string after the artifact. For example, -artifacts "redis.clients:jedis:2.9.0,org.apache.kafka :kafka-clients:1.0.0^org.slf4j:slf4j-api" will load jedis and kafka-clients artifact and all of transitive dependencies but exclude slf4j-api from kafka. --config CONFIG Override default storm conf file --jars JARS When you want to ship other jars which are not included to application jar, you can pass them to --jars option with comma-separated string. For example, --jars "your-local-jar.jar,your-local- jar2.jar" will load your-local-jar.jar and your-local- jar2.jar. --mavenLocalRepositoryDirectory MAVENLOCALREPOSITORYDIRECTORY You can provide local maven repository directory via --mavenLocalRepositoryDirectory if you would like to use specific directory. It might help when you don't have '.m2/repository' directory in home directory, because CWD is sometimes non-deterministic (fragile). --proxyPassword PROXYPASSWORD password of proxy if it requires basic auth --proxyUrl PROXYURL You can also provide proxy information to let dependency resolver utilizing proxy if needed. URL representation of proxy ('http://host:port') --proxyUsername PROXYUSERNAME username of proxy if it requires basic auth --storm-server-classpath If for some reason you need to have the full storm classpath, not just the one for the worker you may include the command line option `--storm-server- classpath`. Please be careful because this will add things to the classpath that will not be on the worker classpath and could result in the worker not running. -h, --help show this help message and exit -storm_config_opts STORM_CONFIG_OPTS, -c STORM_CONFIG_OPTS Override storm conf properties , e.g. nimbus.ui.port=4443 [linyouyi@hadoop01 apache-storm-2.0.0]$ storm jar /home/storm/storm-starter.jar storm.start.WordCountTopology.wordcountTop