zoukankan      html  css  js  c++  java
  • spark学习02天-scala读取文件,词频统计

    1.在本地安装jdk环境和scala环境

    2.读取本地文件:

    scala> import scala.io.Source
    import scala.io.Source
    
    scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
    lines: List[String]
    = List("With the development of civilization, it is the chil dren's duty to study in school since they were small. As the young kids, it is t heir nature to hang out for fun. ", "", "While for them, most of the time have b een limited in the class. So they feel frustrated and don't have much passion to study. It is of great importance to develop ", "", "interest. The first thing i s to broaden vision. The students can read travel books or watch tourist show, f or anyone who cannot resist the charm of beautiful scenery ", "", and delicious food. The second thing is taking the right attitude to exams. Never giving too m uch pressure on getting high marks. The only thing we should do is to enjoy gain ing knowledge.)

    3.词频topN计算

    scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=>
    (x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse


    res0: List[(String, Int)]
    = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study .,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were, 1), (time,1), (them,,1), (children's,1), (development,1), (knowledge.,1), (It,1) , (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1), (travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil ization,,1), (broaden,1), (out,1), (food.,1), (don't,1), (importance,1), (kid...

  • 相关阅读:
    Python——thread
    Python——dummy_thread( _dummy_thread in Python 3.+)
    Python——pyiso8601
    Python——os(一)进程参数
    Python——eventlet.hubs
    Python——eventlet.backdoor
    Python——eventlet.greenthread
    解决zabbix可用性为灰色状态
    实时查看docker容器日志
    docker pull 报错Get https://xxx.xxx.xxx.xxx:5000/v1/_ping: http: server gave HTTP response
  • 原文地址:https://www.cnblogs.com/students/p/10992149.html
Copyright © 2011-2022 走看看