1.从官网上寻找自己需要的合适的版本,此处我用的是maven-3.6.1
wget http://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
2.将其解压在/usr/local 目录下
tar -zxvf apache-maven-3.6.1-bin.tar.gz -C /usr/local rm apache-maven-3.6.1-bin.tar.gz -C /usr/local
3.进入/usr/local目录下,修改maven文件目录的名字
cd /usr/local mv apache-maven-3.6.1 maven-3.6.1
4.接下来进行maven的环境配置
vim /etc/profile
export MAVEN_HOME=/usr/local/maven-3.6.1 export PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$PATH
5.刷新环境变量
source /etc/profile
6.测试maven是否成功安装
mvn -version
7.Java独立应用编程
1)进入用户主文件夹
cd ~
2)创建sparkApp2
mkdir sparkapp2
3)创建一个SimpleApp.java文件在./sparkapp2/src/main/java中
vim ./sparkapp2/src/main/java/SimpleApp.java
内容如下:
1 /*** SimpleApp.java ***/ 2 import org.apache.spark.api.java.*; 3 import org.apache.spark.api.java.function.Function; 4 5 public class SimpleApp { 6 public static void main(String[] args) { 7 String logFile = "file:///usr/local/spark-2.4.3/README.md"; 8 JavaSparkContext sc = new JavaSparkContext("local", "Simple App", 9 "file:///usr/local/spark-2.4.3/", new String[]{"target/simple-project-1.0.jar"}); 10 JavaRDD<String> logData = sc.textFile(logFile).cache(); 11 12 long numAs = logData.filter(new Function<String, Boolean>() { 13 public Boolean call(String s) { 14 return s.contains("a"); 15 } 16 }).count(); 17 18 long numBs = logData.filter(new Function<String, Boolean>() { 19 public Boolean call(String s) { return s.contains("b"); } 20 }).count(); 21 22 System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs); 23 } 24 }
该程序依赖Spark java API,因此需要通过Maven进行编译打包。
4)在./sparkapp2中新建文件pom.xml
此处的依赖如果不知道是哪个具体的版本,可以直接在maven的官网上查看依赖,官网链接https://mvnrepository.com/
内容如下:
<project> <groupId>edu.berkeley</groupId> <artifactId>simple-project</artifactId> <modelVersion>4.0.0</modelVersion> <name>Simple Project</name> <packaging>jar</packaging> <version>1.0</version> <repositories> <repository> <id>Akka repository</id> <url>http://repo.akka.io/releases</url> </repository> </repositories> <dependencies> <dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.4.3</version> </dependency> </dependencies> </project>
5)使用maven程序打包java程序
cd ~/sparkapp2 find .
6)接着将整个应用程序打包成jar(耗时长,十几二十分钟这样子吧,会比sbt时间要长),成功后的提示信息如下所示:
/usr/local/maven-3.6.1/bin.mvn package
7)通过spark-submit运行程序
将生成的jar包通过spark-submit提交到Spark中
/usr/local/spark-2.4.3/bin/spark-submit --class "SimpleApp" ~/sparkapp2/target/simple-project-1.0.jar 2>&1 | grep "Lines with a"