zoukankan      html  css  js  c++  java
  • spark学习02天-scala读取文件,词频统计

    1.在本地安装jdk环境和scala环境

    2.读取本地文件:

    scala> import scala.io.Source
    import scala.io.Source
    
    scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
    lines: List[String]
    = List("With the development of civilization, it is the chil dren's duty to study in school since they were small. As the young kids, it is t heir nature to hang out for fun. ", "", "While for them, most of the time have b een limited in the class. So they feel frustrated and don't have much passion to study. It is of great importance to develop ", "", "interest. The first thing i s to broaden vision. The students can read travel books or watch tourist show, f or anyone who cannot resist the charm of beautiful scenery ", "", and delicious food. The second thing is taking the right attitude to exams. Never giving too m uch pressure on getting high marks. The only thing we should do is to enjoy gain ing knowledge.)

    3.词频topN计算

    scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=>
    (x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse


    res0: List[(String, Int)]
    = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study .,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were, 1), (time,1), (them,,1), (children's,1), (development,1), (knowledge.,1), (It,1) , (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1), (travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil ization,,1), (broaden,1), (out,1), (food.,1), (don't,1), (importance,1), (kid...

  • 相关阅读:
    excel如何设置输入数字后单元格自动填充颜色
    怎样在excel中添加下拉列表框
    System.Security.Cryptography.CryptographicException 出现了内部错误
    怎样让webservice在浏览器远程浏览时像在本地浏览一样有参数输入框
    EF实体框架常见问题
    jdk安装环境变量设置
    IIS网站部署注意点
    浏览器被hao.360.cn劫持怎么办
    python-字典
    python-list列表
  • 原文地址:https://www.cnblogs.com/students/p/10992149.html
Copyright © 2011-2022 走看看