zoukankan      html  css  js  c++  java
  • spark学习02天-scala读取文件,词频统计

    1.在本地安装jdk环境和scala环境

    2.读取本地文件:

    scala> import scala.io.Source
    import scala.io.Source
    
    scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
    lines: List[String]
    = List("With the development of civilization, it is the chil dren's duty to study in school since they were small. As the young kids, it is t heir nature to hang out for fun. ", "", "While for them, most of the time have b een limited in the class. So they feel frustrated and don't have much passion to study. It is of great importance to develop ", "", "interest. The first thing i s to broaden vision. The students can read travel books or watch tourist show, f or anyone who cannot resist the charm of beautiful scenery ", "", and delicious food. The second thing is taking the right attitude to exams. Never giving too m uch pressure on getting high marks. The only thing we should do is to enjoy gain ing knowledge.)

    3.词频topN计算

    scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=>
    (x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse


    res0: List[(String, Int)]
    = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study .,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were, 1), (time,1), (them,,1), (children's,1), (development,1), (knowledge.,1), (It,1) , (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1), (travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil ization,,1), (broaden,1), (out,1), (food.,1), (don't,1), (importance,1), (kid...

  • 相关阅读:
    bzoj4864 [BeiJing 2017 Wc]神秘物质
    HNOI2011 括号修复
    bzoj2402 陶陶的难题II
    ZJOI2008 树的统计
    USACO09JAN 安全出行Safe Travel
    HAOI2015 树上操作
    hdu5126 stars
    BOI2007 Mokia 摩基亚
    SDOI2011 拦截导弹
    国家集训队 排队
  • 原文地址:https://www.cnblogs.com/students/p/10992149.html
Copyright © 2011-2022 走看看