zoukankan      html  css  js  c++  java
  • spark 分析日志文件(key,value)

    Spark读取日志,统计每个service所用的平均时间

    发布时间:2015-12-10 9:54:15
    来源:分享查询网

     

    获取log日志,每个service以“#*#”开头。统计每个service所需的平均时间。

    import java.io.{File, PrintWriter}
    import org.apache.spark.{SparkContext, SparkConf}
    
    object SimpleApp {
    
      def main(args: Array[String]) {
        System.setProperty("hadoop.home.dir","D://spark-1.3.1-bin-hadoop-2.3.0-cdh5.0.2");
    
        val logFile = "d://Debug.2015-06-12_1556.log" // Should be some file on your system
        val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
        val sc = new SparkContext(conf)
        val logData = sc.textFile(logFile, 2).cache()
        val result = logData.filter(line => line.contains("#*#"))
    
        println("********统计开始**********")
    
        //转化为key-value形式的RDD。
        val jobNameAndTime = result.map(line => (line.split("#*#").last.split(" ").head, line.split("#*#").last.split(" ").last.toInt/1000))
    
        val jobNameTimes = jobNameAndTime.map(line => (line._1, 1)).reduceByKey((x, y) => x + y)
    
        val jobAvgTime = jobNameAndTime.reduceByKey((x, y) => (x + y)/2)
    
        //join方法
        val jobTimesAndAvgTime = jobNameTimes.join(jobAvgTime).sortBy(x => x._2._2)
    
        println("********************************************************************")
    
        jobTimesAndAvgTime.map(x => println(s"jobName: ${x._1} | times: ${x._2._1} | avgTime: ${x._2._2}s")).collect
    
        val writer = new PrintWriter(new File("d://test.txt" ))
        writer.write(jobTimesAndAvgTime.map(x => s"jobName: ${x._1} | times: ${x._2._1} | avgTime: ${x._2._2}s
    ").collect.toList.mkString(",").replace(",", ""))
        writer.close
    
    
        println(s"一共 ${result.count} 统计条数据")
    
        println("********************************************************************")
    
    
        println("********统计结束**********")
    
    
      }
    
    }

    ------------------------------

    每个service以“#*#”开头,后面接上所用的时间。
    log日志片段:
    2015-06-11 00:05:32.23423742063 [Worker-88] DEBUG c.z.b.v.a.u.c.d.ConnectionFactoryPrefs$$anon$1 - Spark useDatabase =use ran
    2015-06-11 00:05:32.82023742649 [worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 109
    2015-06-11 00:05:35.18423745013 [Worker-88] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 110
    2015-06-11 00:05:35.18423745013 [worker-1] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 102
    2015-06-11 00:05:35.18523745014 [worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 778
    2015-06-11 00:05:35.18523745014 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 96
    2015-06-11 00:05:35.18523745014 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 42
    2015-06-11 00:05:35.18523745014 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - writing data length: 83
    2015-06-11 00:05:35.18623745015 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: reading data length: 40
    2015-06-11 00:05:35.18623745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloWorldService 26993
    2015-06-11 00:05:35.18623745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.d.ConnectionFactoryPrefs$$anon$1 - database config: DatabaseInfo(jdbc:hive2://192.168.2.110:11000,mr,mr,org.apache.hive.jdbc.HiveDriver,ran)
    2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport@c0770c
    2015-06-11 00:05:35.18723745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloWorldService 36993 
    2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.t.t.TSaslClientTransport - Sending mechanism name PLAIN and initial response of length 6
    2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Writing message with status START and payload length 5
    2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Writing message with status COMPLETE and payload length 6
    2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Start message handled
    2015-06-11 00:05:35.18723745016 [18-worker-1] DEBUG o.a.thrift.transport.TSaslTransport - CLIENT: Main negotiation loop complete
    2015-06-11 00:05:35.18723745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloSUMService 336993 
    2015-06-11 00:05:35.18723745015 [18-worker-1] DEBUG c.z.b.v.a.u.c.j.Quarter1thCleanJob - #*#HelloSUMService 236993 
    

     参考链http://m.fx114.net/qa-177-352127.aspx

  • 相关阅读:
    服务器Nginx 反向代理 其他服务器 8181端口 失败的问题
    Nginx 文件下载 apk 文件下载不了
    https和http 调用过程中请求头 referrer 获取不到的问题
    windows 下 nginx log 分割
    使用Windows Service Wrapper快速创建一个Windows Service 如nginx
    VS 中 无法嵌入互操作类型“……”,请改用适用的接口的解决方法
    使用With递归查询 树
    nginx 常用命令
    (译)(function (window, document, undefined) {})(window, document); 真正的意思
    在Emacs 24.4中使用在线字典
  • 原文地址:https://www.cnblogs.com/canyangfeixue/p/5383353.html
Copyright © 2011-2022 走看看