zoukankan      html  css  js  c++  java
  • spark SQL学习(案例-统计每日uv)

    需求:统计每日uv

    package wujiadong_sparkSQL
    
    
    import org.apache.spark.sql.{Row, SQLContext}
    import org.apache.spark.sql.types._
    import org.apache.spark.{SparkConf, SparkContext}
    import org.apache.spark.sql.functions._
    /**
      * Created by Administrator on 2017/3/6.
      */
    object DailyUV {
      def main(args: Array[String]): Unit = {
        val conf = new SparkConf().setAppName("dailyuv")
        val sc = new SparkContext(conf)
        val sqlContext = new SQLContext(sc)
    
        val userAccesslog = Array(
          "2017-01-01,1122",
          "2017-01-01,1122",
          "2017-01-01,1123",
          "2017-01-01,1124",
          "2017-01-01,1124",
          "2017-01-02,1122",
          "2017-01-01,1121",
          "2017-01-01,1123",
          "2017-01-01,1123"
    
        )
        val AccesslogRDD = sc.parallelize(userAccesslog,2)
        //val AccesslogRDD = sc.textFile("hdfs://master:9000/student/2016113012/data/userAccesslog.txt").map(_.split(","))
        //通过StructType直接指定每个字段的schema
        val schema = StructType(
          Array(
            StructField("date",StringType,true),
            StructField("userid",IntegerType,true)
          )
        )
    
        //j将普通rdd映射到rowRDD
        val RowRDD = AccesslogRDD.map(log => Row(log.split(",")(0),log.split(",")(1).toInt))
        //将schema信息映射到RowRDD上,即创建dataframe
        val df = sqlContext.createDataFrame(RowRDD,schema)
        //要使用spark SQL的内置函数需导入SQLContext下的隐士转换
    
        import sqlContext.implicits._
        df.groupBy("date") //根据日期分组
            .agg('date,countDistinct('userid))//根据日期聚合,然后根据用户id,注意这里的语法是‘引号
             .map(row => Row(row(1),row(2))).collect().foreach(println)
    
    
        //uv含义和业务,每天都有很多用户访问,每个用户可能每天访问很多次,uv指的是对用户进行去重以后的访问次数
    
    
    
    
      }
    
    }
    
    
    
    
    

    运行结果

    hadoop@master:~/wujiadong$ spark-submit --class wujiadong_sparkSQL.DailyUV  --executor-memory 500m --total-executor-cores 2 /home/hadoop/wujiadong/wujiadong.spark.jar 
    17/03/06 21:01:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/03/06 21:01:53 WARN SparkConf: 
    SPARK_CLASSPATH was detected (set to ':/home/hadoop/bigdata/hive/lib/mysql-connector-java-5.1.26-bin.jar').
    This is deprecated in Spark 1.0+.
    
    Please instead use:
     - ./spark-submit with --driver-class-path to augment the driver classpath
     - spark.executor.extraClassPath to augment the executor classpath
            
    17/03/06 21:01:53 WARN SparkConf: Setting 'spark.executor.extraClassPath' to ':/home/hadoop/bigdata/hive/lib/mysql-connector-java-5.1.26-bin.jar' as a work-around.
    17/03/06 21:01:53 WARN SparkConf: Setting 'spark.driver.extraClassPath' to ':/home/hadoop/bigdata/hive/lib/mysql-connector-java-5.1.26-bin.jar' as a work-around.
    17/03/06 21:01:55 INFO Slf4jLogger: Slf4jLogger started
    17/03/06 21:01:55 INFO Remoting: Starting remoting
    17/03/06 21:01:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.131:57493]
    17/03/06 21:01:57 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
    17/03/06 21:01:58 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
    [2017-01-01,4]                                                                  
    [2017-01-02,1]
    17/03/06 21:02:21 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
    17/03/06 21:02:21 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
    17/03/06 21:02:21 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
    
    
    
  • 相关阅读:
    软工作业01 P18 第四题
    自我介绍
    进行代码复审训练
    源代码管理工具调查
    软工作业PSP与单元测试训练
    进行代码复审训练
    源代码管理工具
    软工作业PSP与单元测试训练
    作业
    第一堂课
  • 原文地址:https://www.cnblogs.com/wujiadong2014/p/6516609.html
Copyright © 2011-2022 走看看