zoukankan      html  css  js  c++  java
  • SPARK—Hadoop

    1.环境: spark2.1.0+hadoop+jdk

     设置环境

      配置环境+激活环境
    vim ~/.bashrc   //设置配置文件

      保存配置
    source ~/.bashrc

    启动环境

      hadoop
    start-all.sh
      spark
    spark-shell
      编写应用文本
    gedit word.txt
    
    /**
       *Spark is good
       *Hadoop is good
       *Spark is fast
    */
      上传到hadoop
    hadoop fs -mkdir -p /user/hadoop
    hadoop fs -put word.txt /user/hadoop
    hadoop fs -cat /user/hadoop/word.txt
      使用spark脚本访问Hadoop上的word.txt
    val lines = sc.textFile("hdfs://localhost:9000/user/hadoop/word.txt")
    
    lines.foreach(v =>println(v))
      筛选spark在word.txt中出现次数
    filter:筛选器
    
    val lines = sc.textFile("hdfs://localhost:9000/user/hadoop/word.txt")
    
    val linesWithSpark = lines.filter(line => line.contains("spark"))
    
    println("Lines with Spark :" + linesWithSpark.count()
      LISTARRAY等的操作
    val data = Array(1,2,3,4,5)
    
    val rdd1 = sc.parallelize(data)
    
    val rdd2 = rdd1.map(x => x+10)
    
    rdd2.foreach(v =>println(v))
      以“ ”为分割依据分割文本
    val lines = sc.textFile("hdfs://localhost:9000/user/hadoop/word.txt")
    
    lines.foreach(v=>println(v))
    
    val words = lines.map(line => line.split(" "))
    
    words.foreach(v => println(v))
    
    words.foreach(v =>v.foreach(t => println(t))
    
    val words1 = lines.flatMap(line => line.split(" "))
    
    words1.foreach(v =>println(v))
      统计词频
    val wordPairsRdd = words1.map(word => (word,1))
    
    wordPairsRdd.foreach(v => println(v))
    
    val groupwords = wordPairsRdd.groupByKey() //按照键分组
    
    groupwords.foreach(v => println(v))
    
     
    
    val reducewords = wordPairsRdd.reduceByKey((a,b) => a+b)
    
    reducewords.foreach(v =>println(v))
     

      spark对数据操作
    val rdd = sc.parallelize(Array(1,2,3,4,5,6))
    
    rdd.count()
    
    rdd.first()
    
    rdd.take(4)
    
    rdd.reduce((a,b) => a+b)
    
    rdd.collect()

  • 相关阅读:
    ASM FailGroup验证
    oracle 11g RAC 补丁升级方法
    数据库优化一
    TypeScript(类—继承—多态)
    TypeScript(安装配置—数据类型—函数)
    IE9兼容
    移动端自适应vw、vh、rem
    npm命令随笔
    安卓、IOS兼容问题
    获取页面大小和元素位置offset、client、scroll
  • 原文地址:https://www.cnblogs.com/Dreamer-Jie/p/14025425.html
Copyright © 2011-2022 走看看