zoukankan      html  css  js  c++  java
  • spark学习02天-scala读取文件,词频统计

    1.在本地安装jdk环境和scala环境

    2.读取本地文件:

    scala> import scala.io.Source
    import scala.io.Source
    
    scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
    lines: List[String]
    = List("With the development of civilization, it is the chil dren's duty to study in school since they were small. As the young kids, it is t heir nature to hang out for fun. ", "", "While for them, most of the time have b een limited in the class. So they feel frustrated and don't have much passion to study. It is of great importance to develop ", "", "interest. The first thing i s to broaden vision. The students can read travel books or watch tourist show, f or anyone who cannot resist the charm of beautiful scenery ", "", and delicious food. The second thing is taking the right attitude to exams. Never giving too m uch pressure on getting high marks. The only thing we should do is to enjoy gain ing knowledge.)

    3.词频topN计算

    scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=>
    (x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse


    res0: List[(String, Int)]
    = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study .,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were, 1), (time,1), (them,,1), (children's,1), (development,1), (knowledge.,1), (It,1) , (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1), (travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil ization,,1), (broaden,1), (out,1), (food.,1), (don't,1), (importance,1), (kid...

  • 相关阅读:
    .NET之权限管理
    .NET之带星期的日期显示
    ASP.net MVC 同一view或页面使用多个Model或数据集的方法
    ISBN号校检程序(C#与SQL版)
    ASP操作类似多维Cookies
    C# webBrowser自动登陆windows集成验证方法
    JOI 系列乱做
    NOI2021 部分题解
    「NEERC 2015」Jump 题解
    CF/AT 乱做
  • 原文地址:https://www.cnblogs.com/students/p/10992149.html
Copyright © 2011-2022 走看看