zoukankan      html  css  js  c++  java
  • Win7上Spark WordCount运行过程及异常

    WordCount.Scala代码如下:

    package com.husor.Spark
    
    /**
     * Created by huxiu on 2014/11/26.
     */
    
    import org.apache.spark.{SparkContext, SparkConf}
    import org.apache.spark.SparkContext._
    
    object SparkWordCount {
    
      def main(args: Array[String]) {
    
        println("Test is starting......")
    
        System.setProperty("hadoop.home.dir", "d:\winutil\")
    
        //val conf = new SparkConf().setAppName("WordCount")
        //                          .setMaster("spark://Master:7077")
        //                          .setSparkHome("SPARK_HOME")
        //                          .set("spark.cores.max","2")
    
        //val spark = new SparkContext(conf)
    
        //以本地模式运行WordCount程序
    val spark = new SparkContext("local","WordCount",System.getenv("SPARK_HOME"))
    val file
    = spark.textFile("hdfs://Master:9000/data/test1")
       //将输出结果直接输出到控制台上
        //file.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
    
        val wordCounts = file.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_)
        //将输出结果直接输出到hdfs上
    wordCounts.saveAsTextFile("hdfs://Master:9000/user/huxiu/WordCountOutput")
       spark.stop()

       println(
    "Test is Succeed!!!")

    }
    }

    执行上述WordCount过程中,所遇异常如下:

    Exception 1:

    java.io.IOException: Could not locate executable nullinwinutils.exe in the Hadoop binaries ...............

    Reason: Hadoop Bug

    Solution: http://qnalist.com/questions/4994960/run-spark-unit-test-on-windows-7

    Namely,

    1) download compiled winutils.exe from
    http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight
    2) put this file into d:winutilin
    3) add in my test: System.setProperty("hadoop.home.dir", "d:\winutil\")

    Exception 2:

    Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=huxiu, access=WRITE, inode="/":Spark:supergroup:drwxr-xr-x

    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)

    Reason:

    From the above exception it is easy to see that a job is trying to a create a directory using the username huxiu under a directory named "/" folder is owned by user  Spark who belongs to group named supergroup . Since other users don’t have access to the "/" folder(  rwxr-xr-x ) writes under "/" fails for  huxiu

    Solution: http://www.hadoopinrealworld.com/fixing-org-apache-hadoop-security-accesscontrolexception-permission-denied/

    Namely,

    1> To keep things clean and for better control lets specify the location of the staging directory by setting the mapreduce.jobtracker.staging.root.dir  property in mapred-site.xml . After the property is set, restart mapred service for the property to take effect.

    1 <property>
    2     <name>mapreduce.jobtracker.staging.root.dir</name>
    3     <value>/user</value>
    4 </property>

    2> I have seen several suggestions online suggesting to do a chmod on /user to 777. This is not advisable as doing so will give other users access to delete or modify other users files in HDFS. Instead create a folder named huxiu  under/user  using the root user (in our case it is Spark ) in HDFS. After creating the folder, change the folder permissions to huxiu.

    1 [Spark@Master hadoop]$ hadoop fs -mkdir /user
    2 14/11/26 14:00:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    3 [Spark@Master hadoop]$ hadoop fs -mkdir /user/huxiu
    4 14/11/26 14:04:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    5 [Spark@Master hadoop]$ hadoop fs -chown huxiu:huxiu /user/huxiu
    6 14/11/26 14:04:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    更改异常后,成功运行结果如下:

    "C:Program FilesJavajdk1.7.0_67injava" -Didea.launcher.port=7546 "-Didea.launcher.bin.path=D:ScalaIDEIntelliJ IDEA Community Edition 14.0.1in" -Dfile.encoding=UTF-8 -classpath "C:Program FilesJavajdk1.7.0_67jrelibcharsets.jar;C:Program FilesJavajdk1.7.0_67jrelibdeploy.jar;C:Program FilesJavajdk1.7.0_67jrelibjavaws.jar;C:Program FilesJavajdk1.7.0_67jrelibjce.jar;C:Program FilesJavajdk1.7.0_67jrelibjfr.jar;C:Program FilesJavajdk1.7.0_67jrelibjfxrt.jar;C:Program FilesJavajdk1.7.0_67jrelibjsse.jar;C:Program FilesJavajdk1.7.0_67jrelibmanagement-agent.jar;C:Program FilesJavajdk1.7.0_67jrelibplugin.jar;C:Program FilesJavajdk1.7.0_67jrelib
    esources.jar;C:Program FilesJavajdk1.7.0_67jrelib
    t.jar;C:Program FilesJavajdk1.7.0_67jrelibextaccess-bridge-64.jar;C:Program FilesJavajdk1.7.0_67jrelibextdnsns.jar;C:Program FilesJavajdk1.7.0_67jrelibextjaccess.jar;C:Program FilesJavajdk1.7.0_67jrelibextlocaledata.jar;C:Program FilesJavajdk1.7.0_67jrelibextsunec.jar;C:Program FilesJavajdk1.7.0_67jrelibextsunjce_provider.jar;C:Program FilesJavajdk1.7.0_67jrelibextsunmscapi.jar;C:Program FilesJavajdk1.7.0_67jrelibextzipfs.jar;D:IntelliJ_IDEWorkSpaceoutproductionTest;D:vagrantdataScala2.10.4libscala-actors-migration.jar;D:vagrantdataScala2.10.4libscala-actors.jar;D:vagrantdataScala2.10.4libscala-library.jar;D:vagrantdataScala2.10.4libscala-reflect.jar;D:vagrantdataScala2.10.4libscala-swing.jar;D:SparkSrcspark-assembly-1.1.0-hadoop2.4.0.jar;D:ScalaIDEIntelliJ IDEA Community Edition 14.0.1libidea_rt.jar" com.intellij.rt.execution.application.AppMain com.husor.Spark.SparkWordCount
    Test is starting......
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    14/11/26 14:15:44 INFO SecurityManager: Changing view acls to: huxiu,
    14/11/26 14:15:44 INFO SecurityManager: Changing modify acls to: huxiu,
    14/11/26 14:15:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(huxiu, ); users with modify permissions: Set(huxiu, )
    14/11/26 14:15:44 INFO Slf4jLogger: Slf4jLogger started
    14/11/26 14:15:44 INFO Remoting: Starting remoting
    14/11/26 14:15:44 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@huxiu-PC:54972]
    14/11/26 14:15:44 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@huxiu-PC:54972]
    14/11/26 14:15:44 INFO Utils: Successfully started service 'sparkDriver' on port 54972.
    14/11/26 14:15:44 INFO SparkEnv: Registering MapOutputTracker
    14/11/26 14:15:45 INFO SparkEnv: Registering BlockManagerMaster
    14/11/26 14:15:45 INFO DiskBlockManager: Created local directory at C:UsershuxiuAppDataLocalTempspark-local-20141126141545-9dad
    14/11/26 14:15:45 INFO Utils: Successfully started service 'Connection manager for block manager' on port 54975.
    14/11/26 14:15:45 INFO ConnectionManager: Bound socket to port 54975 with id = ConnectionManagerId(huxiu-PC,54975)
    14/11/26 14:15:45 INFO MemoryStore: MemoryStore started with capacity 969.6 MB
    14/11/26 14:15:45 INFO BlockManagerMaster: Trying to register BlockManager
    14/11/26 14:15:45 INFO BlockManagerMasterActor: Registering block manager huxiu-PC:54975 with 969.6 MB RAM
    14/11/26 14:15:45 INFO BlockManagerMaster: Registered BlockManager
    14/11/26 14:15:45 INFO HttpFileServer: HTTP File server directory is C:UsershuxiuAppDataLocalTempspark-423dcd83-624e-404a-bbf6-a1190f77290f
    14/11/26 14:15:45 INFO HttpServer: Starting HTTP Server
    14/11/26 14:15:45 INFO Utils: Successfully started service 'HTTP file server' on port 54976.
    14/11/26 14:15:45 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    14/11/26 14:15:45 INFO SparkUI: Started SparkUI at http://huxiu-PC:4040
    14/11/26 14:15:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    14/11/26 14:15:45 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@huxiu-PC:54972/user/HeartbeatReceiver
    14/11/26 14:15:46 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=1016667832
    14/11/26 14:15:46 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 969.4 MB)
    14/11/26 14:15:46 INFO FileInputFormat: Total input paths to process : 1
    14/11/26 14:15:46 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
    14/11/26 14:15:46 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    14/11/26 14:15:46 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
    14/11/26 14:15:46 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
    14/11/26 14:15:46 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
    14/11/26 14:15:47 INFO SparkContext: Starting job: saveAsTextFile at SparkWordCount.scala:33
    14/11/26 14:15:47 INFO DAGScheduler: Registering RDD 3 (map at SparkWordCount.scala:32)
    14/11/26 14:15:47 INFO DAGScheduler: Got job 0 (saveAsTextFile at SparkWordCount.scala:33) with 1 output partitions (allowLocal=false)
    14/11/26 14:15:47 INFO DAGScheduler: Final stage: Stage 0(saveAsTextFile at SparkWordCount.scala:33)
    14/11/26 14:15:47 INFO DAGScheduler: Parents of final stage: List(Stage 1)
    14/11/26 14:15:47 INFO DAGScheduler: Missing parents: List(Stage 1)
    14/11/26 14:15:47 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at SparkWordCount.scala:32), which has no missing parents
    14/11/26 14:15:47 INFO MemoryStore: ensureFreeSpace(3424) called with curMem=163705, maxMem=1016667832
    14/11/26 14:15:47 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 969.4 MB)
    14/11/26 14:15:47 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MappedRDD[3] at map at SparkWordCount.scala:32)
    14/11/26 14:15:47 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
    14/11/26 14:15:47 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, ANY, 1174 bytes)
    14/11/26 14:15:47 INFO Executor: Running task 0.0 in stage 1.0 (TID 0)
    14/11/26 14:15:47 INFO HadoopRDD: Input split: hdfs://Master:9000/data/test1:0+27
    14/11/26 14:15:47 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 1862 bytes result sent to driver
    14/11/26 14:15:47 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 489 ms on localhost (1/1)
    14/11/26 14:15:47 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
    14/11/26 14:15:47 INFO DAGScheduler: Stage 1 (map at SparkWordCount.scala:32) finished in 0.500 s
    14/11/26 14:15:47 INFO DAGScheduler: looking for newly runnable stages
    14/11/26 14:15:47 INFO DAGScheduler: running: Set()
    14/11/26 14:15:47 INFO DAGScheduler: waiting: Set(Stage 0)
    14/11/26 14:15:47 INFO DAGScheduler: failed: Set()
    14/11/26 14:15:47 INFO DAGScheduler: Missing parents for Stage 0: List()
    14/11/26 14:15:47 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[5] at saveAsTextFile at SparkWordCount.scala:33), which is now runnable
    14/11/26 14:15:47 INFO MemoryStore: ensureFreeSpace(57512) called with curMem=167129, maxMem=1016667832
    14/11/26 14:15:47 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 56.2 KB, free 969.4 MB)
    14/11/26 14:15:47 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[5] at saveAsTextFile at SparkWordCount.scala:33)
    14/11/26 14:15:47 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
    14/11/26 14:15:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 948 bytes)
    14/11/26 14:15:47 INFO Executor: Running task 0.0 in stage 0.0 (TID 1)
    14/11/26 14:15:47 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329
    14/11/26 14:15:47 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
    14/11/26 14:15:47 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 4 ms
    14/11/26 14:15:48 INFO FileOutputCommitter: Saved output of task 'attempt_201411261415_0000_m_000000_1' to hdfs://Master:9000/user/huxiu/WordCountOutput/_temporary/0/task_201411261415_0000_m_000000
    14/11/26 14:15:48 INFO SparkHadoopWriter: attempt_201411261415_0000_m_000000_1: Committed
    14/11/26 14:15:48 INFO Executor: Finished task 0.0 in stage 0.0 (TID 1). 826 bytes result sent to driver
    14/11/26 14:15:48 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 1) in 847 ms on localhost (1/1)
    14/11/26 14:15:48 INFO DAGScheduler: Stage 0 (saveAsTextFile at SparkWordCount.scala:33) finished in 0.847 s
    14/11/26 14:15:48 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
    14/11/26 14:15:48 INFO SparkContext: Job finished: saveAsTextFile at SparkWordCount.scala:33, took 1.469630513 s
    14/11/26 14:15:48 INFO SparkUI: Stopped Spark web UI at http://huxiu-PC:4040
    14/11/26 14:15:48 INFO DAGScheduler: Stopping DAGScheduler
    14/11/26 14:15:49 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
    14/11/26 14:15:49 INFO ConnectionManager: Selector thread was interrupted!
    14/11/26 14:15:49 INFO ConnectionManager: ConnectionManager stopped
    14/11/26 14:15:49 INFO MemoryStore: MemoryStore cleared
    14/11/26 14:15:49 INFO BlockManager: BlockManager stopped
    14/11/26 14:15:49 INFO BlockManagerMaster: BlockManagerMaster stopped
    Test is Succeed!!!
    14/11/26 14:15:49 INFO SparkContext: Successfully stopped SparkContext
    14/11/26 14:15:49 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
    14/11/26 14:15:49 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
    14/11/26 14:15:49 INFO Remoting: Remoting shut down
    14/11/26 14:15:49 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
    
    Process finished with exit code 0
  • 相关阅读:
    CF280C Game on Tree 概率与期望
    bzoj 3420: Poi2013 Triumphal arch 树形dp+二分
    bzoj 2111: [ZJOI2010]Perm 排列计数 Lucas
    bzoj 3709: [PA2014]Bohater 贪心
    bzoj 1396/2865: 识别子串 后缀自动机+线段树
    【教程】如何搭建一个自己的网站
    C#单例设计模式
    C#双缓冲代码
    Hibernate的查询功能
    hibernate事务规范写法
  • 原文地址:https://www.cnblogs.com/likai198981/p/4123233.html
Copyright © 2011-2022 走看看