zoukankan      html  css  js  c++  java
  • spark学习02天-scala读取文件,词频统计

    1.在本地安装jdk环境和scala环境

    2.读取本地文件:

    scala> import scala.io.Source
    import scala.io.Source
    
    scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
    lines: List[String]
    = List("With the development of civilization, it is the chil dren's duty to study in school since they were small. As the young kids, it is t heir nature to hang out for fun. ", "", "While for them, most of the time have b een limited in the class. So they feel frustrated and don't have much passion to study. It is of great importance to develop ", "", "interest. The first thing i s to broaden vision. The students can read travel books or watch tourist show, f or anyone who cannot resist the charm of beautiful scenery ", "", and delicious food. The second thing is taking the right attitude to exams. Never giving too m uch pressure on getting high marks. The only thing we should do is to enjoy gain ing knowledge.)

    3.词频topN计算

    scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=>
    (x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse


    res0: List[(String, Int)]
    = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study .,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were, 1), (time,1), (them,,1), (children's,1), (development,1), (knowledge.,1), (It,1) , (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1), (travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil ization,,1), (broaden,1), (out,1), (food.,1), (don't,1), (importance,1), (kid...

  • 相关阅读:
    Unity的动态加载简单使用
    枚举的使用总结
    IIS WEB程序如何访问共享目录
    AngularJS之页面跳转Route
    ASP.NET MVC4 BundleConfig的注意事项
    Android自动化测试------monkey自定义脚本(四)
    Android自动化测试------monkey(三)
    Android自动化测试------monkey(二)
    Android自动化测试------monkey(一)
    (一)adb命令的使用
  • 原文地址:https://www.cnblogs.com/students/p/10992149.html
Copyright © 2011-2022 走看看