zoukankan      html  css  js  c++  java
  • sparkStreaming 读kafka的数据

    目标:sparkStreaming每2s中读取一次kafka中的数据,进行单词计数。

    topic:topic1

    broker list:192.168.1.126:9092,192.168.1.127:9092,192.168.1.128:9092

    1、首先往一个topic中实时生产数据。

      代码如下: 代码功能:每秒向topic1发送一条消息,一条消息里包含4个单词,单词之间用空格隔开。

     1 package kafkaProducer
     2 
     3 import java.util.HashMap
     4 
     5 import org.apache.kafka.clients.producer._
     6 
     7 
     8 object KafkaProducer {
     9 def main(args: Array[String]) {
    10   val topic="topic1"
    11   val brokers="192.168.1.126:9092,192.168.1.127:9092,192.168.1.128:9092"
    12   val messagesPerSec=1 //每秒发送几条信息  
    13   val wordsPerMessage =4 //一条信息包括多少个单词  
    14   // Zookeeper connection properties  
    15     val props = new HashMap[String, Object]()  
    16     props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)  
    17     props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,  
    18       "org.apache.kafka.common.serialization.StringSerializer")  
    19     props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,  
    20       "org.apache.kafka.common.serialization.StringSerializer")  
    21     val producer = new KafkaProducer[String, String](props) 
    22     // Send some messages  
    23      while(true) {  
    24       (1 to messagesPerSec.toInt).foreach { messageNum =>  
    25         val str = (1 to wordsPerMessage.toInt).map(x => scala.util.Random.nextInt(10).toString)  
    26           .mkString(" ")  
    27         val message = new ProducerRecord[String, String](topic, null, str)  
    28         producer.send(message)  
    29         println(message)  
    30       }  
    31       Thread.sleep(1000)  
    32     }  
    33   }  
    34 }

     打包运行命令:hadoop jar jar包  (注意jar包是可运行的jar包)

    消费者消费命令: ./kafka-console-consumer.sh  --zookeeper zk01:2181,zk02:2181  --topic topic1 --from-beginning

    可以正常消费。

    2、编写SparkStreaming代码读kafka中的数据,每2s读一次

      代码如下:

     1 package kafkaSparkStream
     2 
     3 import org.apache.spark.SparkConf
     4 import org.apache.spark.streaming.StreamingContext
     5 import org.apache.spark.streaming.Seconds
     6 import org.apache.spark.streaming.kafka.KafkaUtils
     7 import kafka.serializer.StringDecoder
     8 /**
     9  * sparkStreaming读取kafka中topic的数据
    10  */
    11 object KafkaToSpark {
    12 def main(args: Array[String]) {
    13   if (args.length<2) {
    14   System.err.println("Usage: <brokers> <topics>");
    15   System.exit(1)
    16   }
    17   val Array(brokers,topics)=args
    18   //2s从kafka中读取一次
    19   val conf=new SparkConf().setAppName("KafkaToSpark");
    20   val scc=new StreamingContext(conf,Seconds(2))
    21   // Create direct kafka stream with brokers and topics  
    22   val topicSet=topics.split(",").toSet
    23   val kafkaParams=Map[String,String]("metadata.broker.list"->brokers)
    24   //获取信息
    25   val messages=KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
    26       scc,kafkaParams,topicSet)
    27   // Get the lines, split them into words, count the words and print 
    28  val lines= messages.map(_._2)
    29  val words=lines.flatMap(_.split(" "))
    30  val wordCouts=words.map(x =>(x,1L)).reduceByKey(_+_)
    31  wordCouts.print
    32  //开启计算
    33  scc.start()
    34  scc.awaitTermination()
    35 }
    36 
    37 }

     打包运行命令:./spark-submit --class kafkaSparkStream.KafkaToSpark --master yarn-client /home/hadoop/sparkJar/kafkaToSpark.jar 192.168.1.126:9092,192.168.1.127:9092,192.168.1.128:9092 topic1

    运行成功!

  • 相关阅读:
    接口测试01
    mysql主从
    linux下配置JDK
    linux常用命令
    mysql基本语句
    线程与进程
    loadrunner函数
    设计模式-模板方法模式
    设计原则-CRP合成复用原则
    设计原则-LOD迪米特法则
  • 原文地址:https://www.cnblogs.com/kwzblog/p/7513390.html
Copyright © 2011-2022 走看看