zoukankan      html  css  js  c++  java
  • 寒假自学进度9

    1.spark-shell 交互式编程

    (1)该系总共有多少学生;

    (2)该系共开设来多少门课程;

    (3)Tom 同学的总成绩平均分是多少;

    (4)求每名同学的选修的课程门数;

    (5)该系 DataBase 课程共有多少人选修;

    (6)各门课程的平均分是多少;

    (7)使用累加器计算共有多少人选了 DataBase 这门课。

    第一问:
    val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt")
    val par = lines.map(row=>row.split(",")(0)) 
    val distinct_par = par.distinct() //去重操作
    distinct_par.count //取得总数
    第二问
    val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt")
    val par = lines.map(row=>row.split(",")(1)) 
    val distinct_par = par.distinct() 
    distinct_par.count
    第三问:
    val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt")
    val pare = lines.filter(row=>row.split(",")(0)=="Tom")
    pare.foreach(println)
    Tom,DataBase,26
    Tom,Algorithm,12
    Tom,OperatingSystem,16
    Tom,Python,40
    Tom,Software,60
    pare.map(row=>(row.split(",")(0),row.split(",")(2).toInt)).mapValues(x=>(x,1)).reduceByKey((x,y
    ) => (x._1+y._1,x._2 + y._2)).mapValues(x => (x._1 / x._2)).collect()
    //res9: Array[(String, Int)] = Array((Tom,30))
    第四问:
    val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt")
    val pare = lines.filter(row=>row.split(",")(1)=="DataBase")
    pare.count
    res1: Long = 126
    第五问:
    val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt")
    val pare = lines.filter(row=>row.split(",")(1)=="DataBase")
    pare.count
    res1: Long = 126
    第六问:
    val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt")
    val pare = lines.map(row=>(row.split(",")(1),row.split(",")(2).toInt))
    pare.mapValues(x=>(x,1)).reduceByKey((x,y) => (x._1+y._1,x._2 + y._2)).mapValues(x => (x._1 / x._2)).collect()
    res0: Array[(String, Int)] = Array((Python,57), (OperatingSystem,54), (CLanguage,50), 
    (Software,50), (Algorithm,48), (DataStructure,47), (DataBase,50), (ComputerNetwork,51))
    第七问:
    val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt")
    val pare = lines.map(row=>(row.split(",")(1),row.split(",")(2).toInt))
    pare.mapValues(x=>(x,1)).reduceByKey((x,y) => (x._1+y._1,x._2 + y._2)).mapValues(x => (x._1 / x._2)).collect()
    

      

  • 相关阅读:
    java web项目打包.war格式
    version 1.4.2-04 of the jvm is not suitable for thi
    Sugarcrm Email Integration
    sharepoint 2010 masterpage中必须的Content PlaceHolder
    微信开放平台
    Plan for caching and performance in SharePoint Server 2013
    使用自定义任务审批字段创建 SharePoint 顺序工作流
    Technical diagrams for SharePoint 2013
    To get TaskID's Integer ID value from the GUID in SharePoint workflow
    how to get sharepoint lookup value
  • 原文地址:https://www.cnblogs.com/1502762920-com/p/12285521.html
Copyright © 2011-2022 走看看