zoukankan      html  css  js  c++  java
  • eclipse构建maven+scala+spark工程

    前提条件

    下载安装Scala IDE build of Eclipse SDK

    构建工程

    1、新建maven工程

    2、配置项目信息

    3、新建scala对应的Source Folder

     

    4、添加scala对应Archetype

    5、配置pom.xml添加依赖包

    <dependency>
    			<groupId>org.apache.spark</groupId>
    			<artifactId>spark-core_2.10</artifactId>
    			<version>1.6.0</version>
    		</dependency>
    
    
    		<dependency>
    			<groupId>org.apache.spark</groupId>
    			<artifactId>spark-sql_2.10</artifactId>
    			<version>1.6.0</version>
    		</dependency>
    
    
    
    		<dependency>
    			<groupId>redis.clients</groupId>
    			<artifactId>jedis</artifactId>
    			<version>2.8.0</version>
    		</dependency>
    
    		<dependency>
    			<groupId>postgresql</groupId>
    			<artifactId>postgresql</artifactId>
    			<version>9.1-901-1.jdbc4</version>
    		</dependency>
    
    
    		<!-- https://mvnrepository.com/artifact/com.databricks/spark-csv_2.11 -->
    		<dependency>
    			<groupId>com.databricks</groupId>
    			<artifactId>spark-csv_2.10</artifactId>
    			<version>1.3.2</version>
    		</dependency>
    
    
    		<!-- https://mvnrepository.com/artifact/org.apache.camel/camel-ftp -->
    		<dependency>
    			<groupId>org.apache.camel</groupId>
    			<artifactId>camel-ftp</artifactId>
    			<version>2.13.2</version>
    		</dependency>
    
    
    
    		<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->
    		<!-- <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> 
    			<version>1.2.0</version> </dependency> -->
    
    		<dependency>
    			<groupId>org.apache.hbase</groupId>
    			<artifactId>hbase-client</artifactId>
    			<version>1.2.0</version>
    		</dependency>
    		<dependency>
    			<groupId>jdk.tools</groupId>
    			<artifactId>jdk.tools</artifactId>
    			<version>1.7</version>
    			<scope>system</scope>
    			<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
    		</dependency>
    
    
    		<!-- https://mvnrepository.com/artifact/com.vividsolutions/jts -->
    		<dependency>
    			<groupId>com.vividsolutions</groupId>
    			<artifactId>jts</artifactId>
    			<version>1.13</version>
    		</dependency>
    
    
    		<!-- https://mvnrepository.com/artifact/com.opencsv/opencsv -->
    		<dependency>
    			<groupId>com.opencsv</groupId>
    			<artifactId>opencsv</artifactId>
    			<version>3.8</version>
    		</dependency>
    
    		<dependency>
    			<groupId>org.apache.commons</groupId>
    			<artifactId>commons-csv</artifactId>
    			<version>1.2</version>
    		</dependency>
    

      

     到此项目部署配置完成。

    运行scala程序

    编写scala程序

    package com.lilei
    
    object test {
      def main(args: Array[String]): Unit = {
        println("hello scala world !")
      }
    }

    运行本地spark程序

    编写spark程序

    package com.lilei
    
    import org.apache.spark.SparkConf
    import org.apache.spark.SparkContext
    
    object hello_spark {
    
      def main(args: Array[String]): Unit = {
        val conf = new SparkConf()
          .setAppName("dsfsdf").setMaster("local")
    
        val sc = new SparkContext(conf)
    
        val path = "C:\test\es\elasticsearch-5.6.3\config\jvm.options"
    
        sc.textFile(path).foreach(println)
      }
    
    }
    

    输出信息:

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    17/11/09 10:36:40 INFO SparkContext: Running Spark version 1.6.0
    17/11/09 10:36:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/11/09 10:36:41 INFO SecurityManager: Changing view acls to: lilei3774
    17/11/09 10:36:41 INFO SecurityManager: Changing modify acls to: lilei3774
    17/11/09 10:36:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(lilei3774); users with modify permissions: Set(lilei3774)
    17/11/09 10:36:41 INFO Utils: Successfully started service 'sparkDriver' on port 8351.
    17/11/09 10:36:41 INFO Slf4jLogger: Slf4jLogger started
    17/11/09 10:36:41 INFO Remoting: Starting remoting
    17/11/09 10:36:42 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.22.34.186:8388]
    17/11/09 10:36:42 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 8388.
    17/11/09 10:36:42 INFO SparkEnv: Registering MapOutputTracker
    17/11/09 10:36:42 INFO SparkEnv: Registering BlockManagerMaster
    17/11/09 10:36:42 INFO DiskBlockManager: Created local directory at C:Userslilei3774AppDataLocalTemplockmgr-ddd1f997-a689-43d3-878e-73f2a76bd3da
    17/11/09 10:36:42 INFO MemoryStore: MemoryStore started with capacity 146.2 MB
    17/11/09 10:36:42 INFO SparkEnv: Registering OutputCommitCoordinator
    17/11/09 10:36:42 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    17/11/09 10:36:42 INFO SparkUI: Started SparkUI at http://172.22.34.186:4040
    17/11/09 10:36:42 INFO Executor: Starting executor ID driver on host localhost
    17/11/09 10:36:42 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 8440.
    17/11/09 10:36:42 INFO NettyBlockTransferService: Server created on 8440
    17/11/09 10:36:42 INFO BlockManagerMaster: Trying to register BlockManager
    17/11/09 10:36:42 INFO BlockManagerMasterEndpoint: Registering block manager localhost:8440 with 146.2 MB RAM, BlockManagerId(driver, localhost, 8440)
    17/11/09 10:36:42 INFO BlockManagerMaster: Registered BlockManager
    17/11/09 10:36:43 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
    17/11/09 10:36:43 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 135.5 KB, free 135.5 KB)
    17/11/09 10:36:43 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.7 KB, free 148.2 KB)
    17/11/09 10:36:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:8440 (size: 12.7 KB, free: 146.2 MB)
    17/11/09 10:36:43 INFO SparkContext: Created broadcast 0 from textFile at hello_spark.scala:16
    17/11/09 10:36:43 WARN : Your hostname, lilei03774 resolves to a loopback/non-reachable address: fe80:0:0:0:147a:9144:154:bd1f%26, but we couldn't find any external IP address!
    17/11/09 10:36:44 INFO FileInputFormat: Total input paths to process : 1
    17/11/09 10:36:44 INFO SparkContext: Starting job: foreach at hello_spark.scala:16
    17/11/09 10:36:44 INFO DAGScheduler: Got job 0 (foreach at hello_spark.scala:16) with 1 output partitions
    17/11/09 10:36:44 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at hello_spark.scala:16)
    17/11/09 10:36:44 INFO DAGScheduler: Parents of final stage: List()
    17/11/09 10:36:44 INFO DAGScheduler: Missing parents: List()
    17/11/09 10:36:44 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at textFile at hello_spark.scala:16), which has no missing parents
    17/11/09 10:36:44 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 151.2 KB)
    17/11/09 10:36:44 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1838.0 B, free 153.0 KB)
    17/11/09 10:36:44 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:8440 (size: 1838.0 B, free: 146.2 MB)
    17/11/09 10:36:44 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
    17/11/09 10:36:44 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at textFile at hello_spark.scala:16)
    17/11/09 10:36:44 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
    17/11/09 10:36:44 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2157 bytes)
    17/11/09 10:36:44 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
    17/11/09 10:36:44 INFO HadoopRDD: Input split: file:/C:/test/es/elasticsearch-5.6.3/config/jvm.options:0+3068
    17/11/09 10:36:44 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
    17/11/09 10:36:44 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    17/11/09 10:36:44 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
    17/11/09 10:36:44 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
    17/11/09 10:36:44 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
    ## JVM configuration
    
    ################################################################
    ## IMPORTANT: JVM heap size
    ################################################################
    ##
    ## You should always set the min and max JVM heap
    ## size to the same value. For example, to set
    ## the heap to 4 GB, set:
    ##
    ## -Xms4g
    ## -Xmx4g
    ##
    ## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
    ## for more information
    ##
    ################################################################
    
    # Xms represents the initial size of total heap space
    # Xmx represents the maximum size of total heap space
    
    -Xms512m
    -Xmx512m
    
    ################################################################
    ## Expert settings
    ################################################################
    ##
    ## All settings below this section are considered
    ## expert settings. Don't tamper with them unless
    ## you understand what you are doing
    ##
    ################################################################
    
    ## GC configuration
    -XX:+UseConcMarkSweepGC
    -XX:CMSInitiatingOccupancyFraction=75
    -XX:+UseCMSInitiatingOccupancyOnly
    
    ## optimizations
    
    # pre-touch memory pages used by the JVM during initialization
    -XX:+AlwaysPreTouch
    
    ## basic
    
    # force the server VM (remove on 32-bit client JVMs)
    -server
    
    # explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
    -Xss1m
    
    # set to headless, just in case
    -Djava.awt.headless=true
    
    # ensure UTF-8 encoding by default (e.g. filenames)
    -Dfile.encoding=UTF-8
    
    # use our provided JNA always versus the system one
    -Djna.nosys=true
    
    # use old-style file permissions on JDK9
    -Djdk.io.permissionsUseCanonicalPath=true
    
    # flags to configure Netty
    -Dio.netty.noUnsafe=true
    -Dio.netty.noKeySetOptimization=true
    -Dio.netty.recycler.maxCapacityPerThread=0
    
    # log4j 2
    -Dlog4j.shutdownHookEnabled=false
    -Dlog4j2.disable.jmx=true
    -Dlog4j.skipJansi=true
    
    ## heap dumps
    
    # generate a heap dump when an allocation from the Java heap fails
    # heap dumps are created in the working directory of the JVM
    -XX:+HeapDumpOnOutOfMemoryError
    
    # specify an alternative path for heap dumps
    # ensure the directory exists and has sufficient space
    #-XX:HeapDumpPath=${heap.dump.path}
    
    ## GC logging
    
    #-XX:+PrintGCDetails
    #-XX:+PrintGCTimeStamps
    #-XX:+PrintGCDateStamps
    #-XX:+PrintClassHistogram
    #-XX:+PrintTenuringDistribution
    #-XX:+PrintGCApplicationStoppedTime
    
    # log GC status to a file with time stamps
    # ensure the directory exists
    #-Xloggc:${loggc}
    
    # By default, the GC log file will not rotate.
    # By uncommenting the lines below, the GC log file
    # will be rotated every 128MB at most 32 times.
    #-XX:+UseGCLogFileRotation
    #-XX:NumberOfGCLogFiles=32
    #-XX:GCLogFileSize=128M
    
    # Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
    # If documents were already indexed with unquoted fields in a previous version
    # of Elasticsearch, some operations may throw errors.
    #
    # WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
    # only for migration purposes.
    #-Delasticsearch.json.allow_unquoted_field_names=true
    17/11/09 10:36:44 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2044 bytes result sent to driver
    17/11/09 10:36:44 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 95 ms on localhost (1/1)
    17/11/09 10:36:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
    17/11/09 10:36:44 INFO DAGScheduler: ResultStage 0 (foreach at hello_spark.scala:16) finished in 0.107 s
    17/11/09 10:36:44 INFO DAGScheduler: Job 0 finished: foreach at hello_spark.scala:16, took 0.183296 s
    17/11/09 10:36:44 INFO SparkContext: Invoking stop() from shutdown hook
    17/11/09 10:36:44 INFO SparkUI: Stopped Spark web UI at http://172.22.34.186:4040
    17/11/09 10:36:44 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    17/11/09 10:36:44 INFO MemoryStore: MemoryStore cleared
    17/11/09 10:36:44 INFO BlockManager: BlockManager stopped
    17/11/09 10:36:44 INFO BlockManagerMaster: BlockManagerMaster stopped
    17/11/09 10:36:44 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    17/11/09 10:36:44 INFO SparkContext: Successfully stopped SparkContext
    17/11/09 10:36:44 INFO ShutdownHookManager: Shutdown hook called
    17/11/09 10:36:44 INFO ShutdownHookManager: Deleting directory C:Userslilei3774AppDataLocalTempspark-2264631f-e639-4cca-8c51-7c98994dc6b1
    

      

      

  • 相关阅读:
    css:chorm调试工具(修改样式、重置缩放比例、错误提示、语法快速生成)
    多线程:线程不安全案例(买票、银行取钱、集合)
    css:选择器(标签、类、ID、通配符)
    多线程:(优先级、守护线程)
    多线程(线程的状态、终止、休眠、礼让、合并)
    html:标签(标题、段落、换行、文本格式化、图像)
    多线程:多线程的应用(网图下载、模拟售票、龟兔赛跑)
    Struts2
    框架、MVC、MVC框架
    html5、css3、BootStrap
  • 原文地址:https://www.cnblogs.com/lilei2blog/p/7808203.html
Copyright © 2011-2022 走看看