zoukankan      html  css  js  c++  java
  • Spark基础

    1 读取本地文件

    ./spark-shell 

    scala> val textFile=sc.textFile("file:///home/hadoop/wordfile1.txt")
    textFile: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/wordfile1.txt MapPartitionsRDD[3] at textFile at <console>:24

    scala> textFile.first()
    res2: String = I love Spark

    2 读取hdfs文件

    首先要启动hdfs,然后上传文件至hdfs,才能用下面的命令读取。

    scala> val textFile=sc.textFile("hdfs://localhost:9000/user/hadoop/input/wordfile1.txt")
    textFile: org.apache.spark.rdd.RDD[String] = hdfs://localhost:9000/user/hadoop/input/wordfile1.txt MapPartitionsRDD[7] at textFile at <console>:24

    scala> textFile.first()
    res4: String = I love Spark

    scala> val textFile=sc.textFile("input/wordfile1.txt")
    textFile: org.apache.spark.rdd.RDD[String] = input/wordfile1.txt MapPartitionsRDD[9] at textFile at <console>:24

    scala> textFile.first()
    res5: String = I love Spark

    scala> val textFile=sc.textFile("/user/hadoop/input/wordfile1.txt")
    textFile: org.apache.spark.rdd.RDD[String] = /user/hadoop/input/wordfile1.txt MapPartitionsRDD[11] at textFile at <console>:24

    scala> textFile.count()
    res6: Long = 2

    scala> textFile.first()
    res8: String = I love Spark

    3 词频统计

    scala> val wordCount=textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey((a,b)=>(a+b))
    wordCount: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[14] at reduceByKey at <console>:26

    scala> wordCount.collect()
    res9: Array[(String, Int)] = Array((Spark,1), (love,2), (I,2), (Hadoop,1))

  • 相关阅读:
    xcode6新建pch文件过程
    系统提供的dispatch方法
    iOS 默认Cell选中
    sqoop部署
    maven自动化部署插件sshexec-maven-plugin
    spring-7、Spring 事务实现方式
    Spring-6.1、Java三种代理模式:静态代理、动态代理和cglib代理
    spring-6、动态代理(cglib 与 JDK)
    spring -3、spring 的 IOC与AOP
    Spring-2、Spring Bean 的生命周期
  • 原文地址:https://www.cnblogs.com/zhouhb/p/10363070.html
Copyright © 2011-2022 走看看