zoukankan html css js c++ java

Spark实验汇总(七个实验相结合)

日期：2020.01.20

博客期：128

星期一

　　一、环境搭建篇

　　　　1、安装虚拟机应用程序 VMware Workstation Pro

　　　　　　【编写日期：2020-01-20】

　　　　　　去到官网下载 VMware Workstation Pro

　　　　要下载这个：　

　　　　　　【编写完毕】

　　　　2、安装Ubuntu

　　　　　　学习资源来源于林子雨老师的平台

　　　　　　地址：http://dblab.xmu.edu.cn/blog/285/

　　　　3、配置 hadoop环境

　　　　　　学习资源来源于林子雨老师的平台

　　　　　　地址：http://dblab.xmu.edu.cn/blog/install-hadoop-cluster/

　　　　4、配置spark环境

　　　　　　学习资源来源于林子雨老师的平台

　　　　　　地址：http://dblab.xmu.edu.cn/blog/804-2/

　　　　5、配置mysql环境

　　　　　　学习资源来源于林子雨老师的平台

　　　　　　参考链接：http://dblab.xmu.edu.cn/blog/install-mysql/#more-1002

　　　　6、本地 Navicat 连接外部 mysql

　　　　　　【编写日期：2020-01-23】

　　　　　　我们需要找到文件host：[文件位置:C:WindowsSystem32driversetc]

　　　　　　看看你有没有对虚拟机进行IP地址映射

　　　　　　如果有如图：

　　　　　　你就可以直接在主机名那一行写映射对应的主机名，否则就老老实实写IP地址吧

　　　　　　【编写完毕】

　　　　7、虚拟机安装Eclipse +集成包配置

　　　　　　学习资源来源于林子雨老师的平台

　　　　　　网页地址：http://dblab.xmu.edu.cn/blog/290-2/

　　　　8、安装 Flume

　　　　　　参考博客：https://blog.csdn.net/qq_39839745/article/details/85278066

　　二、命令篇

　　　　1、Linux 基本命令汇总（大数据用的上的）

　　　　【编写日期：2020-01-20】　　

//----------[目录操作]
(1)、CD命令
　　cd /usr/local/hadoop　　　　移动到绝对路径为 "/usr/local/hadoop"的目录下
　　cd ./data　　　　　　　　　移动到相对路径为 "./data"的目录下（后面也可以是 data）
　　cd ./../data　　　　　　　　先返回上一级（..），再进入返回到的这一级的 data 目录下
(2)、MKDIR命令
　　mkdir -p /hadoop/test　　　 递归创建目录 
(3)、RMDIR命令
　  rmdir /usr/local/hadoop　　  删除空白目录
//----------[文件操作]
(4)、TOUCH命令
　　touch /usr/local/hadoop　　   新建空白文件
(5)、CAT命令
　　cat /usr/local/hadoop　　　　查看文件信息（输出到控制台）
(6)、RM命令
　　rm -r /usr/local　　　　　　　删除文件夹(同 rmdir )
　　rm /usr/local/test.txt　　　　   删除文件
　　rm -f /usr/local　　　　　　　强制删除
(7)、CP命令
　　cp /usr/local/test.txt data.txt    将[前面的路径所在文件]复制到[后面的路径所在文件]中去
(8)、MV命令
　　mv /usr/local/test.txt data.txt   将[前面的路径所在文件]移动到[后面的路径所在文件]中去
(9)、VI (VIM)命令
　　vi /usr/local/test.txt　　　　　使用VI程序编写文件
　　vim data.txt　　　　　　　　使用VIM程序编写文件
　　 (VI使用方法，不过多赘述)
//----------[压缩操作]
(10)、TAR命令
　　tar -zcvf /usr/local/test/* deal/new.gz　　　　将上述所有文件打包成 gz 格式的文件
　　tar -xvf new.gz -C /home/Downloads　　　　　　　　将上述文件解压到指定位置
//----------[其他操作]
(11)、find 命令
　　find  /etc -name "data.txt"　　查找名称为 "data.txt"的文件
(12)、sudo 命令
　　sudo + 上述命令　　　　　　使用 超级用户 进行操作

Linux基础操作命令

　　　　【编写完毕】

　　　　2、HDFS文件系统的使用汇总

　　　　　　参考博客：https://blog.csdn.net/majianxiong_lzu/article/details/89174176

　　　　3、Spark-shell命令

　　　　　　参考博客：https://blog.csdn.net/wawa8899/article/details/81016029

　　三、编程篇

　　　　1、HDFS的Java操作代码

　　　　【编写日期：2020-01-29】

　　　　　　　本页代码仅提供参考...

　　　　　　　能够对 HDFS 系统文件做操作的类的封装：

  1 package com.hadoop.hdfs;
  2 
  3 import java.io.File;
  4 import java.io.FileOutputStream;
  5 import java.io.IOException;
  6 import java.io.OutputStream;
  7 import java.util.List;
  8 import java.util.Scanner;
  9 
 10 import org.apache.hadoop.conf.Configuration;
 11 import org.apache.hadoop.fs.FSDataInputStream;
 12 import org.apache.hadoop.fs.FSDataOutputStream;
 13 import org.apache.hadoop.fs.FileSystem;
 14 import org.apache.hadoop.fs.Path;
 15 import org.apache.hadoop.io.IOUtils;
 16 
 17 //HDFS文件处理工具
 18 public class HDFSFileDealer {
 19     //设定集
 20     protected Configuration conf = null;
 21     //文件系统
 22     protected FileSystem fs = null;
 23     //获取路径
 24     protected Path toPath(String fileName) {
 25         return new Path(fileName);
 26     }
 27     public static String toRealPath(String file) {
 28         return "../../"+file;
 29     }
 30     //在HDFS判断是否存在文件
 31     public boolean exist(String fileName){
 32         try {
 33             return fs.exists(toPath(fileName));
 34         } catch (IOException e) {
 35             System.out.println("爷爷！文件加载失败！");
 36         }
 37         return false;
 38     }
 39     //从HDFS读取文件输出到控制台
 40     public void loadToWin() {
 41         try {
 42             FSDataInputStream fis = fs.open(new Path("/user/hadoop/hdfstest1.txt"));
 43             //读取文件输出到控制台
 44             IOUtils.copyBytes(fis, System.out, conf, true);
 45         } catch (IOException e) {
 46             System.out.println("爷爷！文件加载失败！");
 47         }
 48     }
 49     //从HDFS读取文件，保存写入到本地
 50     public void loadToFile(String local_file,String hdfs_file){
 51         try {
 52             FSDataInputStream fis = fs.open(new Path(hdfs_file));
 53             OutputStream out = new FileOutputStream(new File(local_file));
 54             //从HDFS读取文件，写入本地
 55             IOUtils.copyBytes(fis, out, conf, true);
 56         } catch (IOException e) {
 57             System.out.println("爷爷！文件加载失败！");
 58         }
 59     }
 60     //在HDFS创建一个多级目录
 61     public void mkdir(String newdir){
 62         try {
 63             Path outputDir = toPath(newdir);
 64             if(!fs.exists(outputDir)){//判断如果不存在就删除
 65                 fs.mkdirs(toPath(newdir));
 66             }else {
 67                 System.out.println("文件路径已经存在！");
 68             }
 69         } catch (IOException e) {
 70             System.out.println("爷爷！文件加载失败！");
 71         }   
 72     }
 73     //删除HDFS文件
 74     public void delete(String fileName){
 75         try {
 76             if(fs.exists(toPath(fileName)))//判断如果不存在就删除
 77             {
 78                 fs.delete(toPath(fileName),true);
 79             } else {
 80                 System.out.println("文件路径不存在！");
 81             }
 82         } catch (IOException e) {
 83             System.out.println("爷爷！文件加载失败！");
 84         }
 85     }
 86     //文件上传
 87     public void updata(String local_file,String hdfs_file){
 88         updata(local_file,hdfs_file,false);
 89     }
 90     public void updata(String local_file,String hdfs_file,boolean hasDeleted){
 91         Path srcPath = new Path(local_file); //本地上传文件路径
 92         Path dstPath = new Path(hdfs_file); //HDFS目标路径
 93         //调用文件系统的文件复制函数,前面参数是指是否删除原文件，true为删除，默认为false
 94         try {
 95             fs.copyFromLocalFile(hasDeleted, srcPath, dstPath);
 96         } catch (IOException e) {
 97             System.out.println("爷爷！文件加载失败，未完成上传！");
 98             System.out.println("在updata里");
 99         }
100     }
101     //从本地上传多个文件到HDFS
102     public void updata(List <String> li_str,String hdfs_file) {
103         try {
104             if(li_str==null||li_str.size()==0)
105                 throw new IOException();
106             int length = li_str.size();
107             Path[] paths = new Path[length];
108             Path dstPath = new Path(hdfs_file); //HDFS目标路径
109             for(int i=0;i<length;++i)
110             {
111                 paths[i] = toPath(li_str.get(i));
112             }
113             //调用文件系统的文件复制函数,前面参数是指是否删除原文件，true为删除，默认为false
114             fs.copyFromLocalFile(false, true, paths, dstPath);
115         } catch (IOException e) {
116             System.out.println("爷爷！文件加载失败！");
117         }
118     }
119     public void updata(String []li_str,String hdfs_file) {
120         try {
121             if(li_str==null||li_str.length==0)
122                 throw new IOException();
123             
124             int length = li_str.length;
125             Path[] paths = new Path[length];
126             Path dstPath = new Path(hdfs_file); //HDFS目标路径
127             for(int i=0;i<length;++i)
128             {
129                 paths[i] = toPath(li_str[i]);
130             }
131             //调用文件系统的文件复制函数,前面参数是指是否删除原文件，true为删除，默认为false
132             fs.copyFromLocalFile(false, true, paths, dstPath);
133         } catch (IOException e) {
134             System.out.println("爷爷！文件加载失败！");
135         }
136     }
137     //HDFS文件下载
138     public void download(String local_file,String hdfs_file) {
139         download(local_file,hdfs_file,false);
140     }
141     public void download(String local_file,String hdfs_file,boolean hasDeleted){
142         Path dstPath = toPath(local_file);
143         Path srcPath = toPath(hdfs_file);
144         try {
145             fs.copyToLocalFile(hasDeleted, srcPath, dstPath);
146         } catch (IOException e) {
147             System.out.println("爷爷！文件加载失败，未完成下载！");
148         }  
149     }
150     //类的释放
151     public void free() {
152         try {
153             if(fs!=null)
154                 fs.close();
155         } catch (IOException e) {
156             System.out.println("GrandFather ! Your program have a IOException! ");
157         }
158     }
159     //在HDFS创建文件并写入内容
160     public void touchFileWith(String fileName){
161         try {
162             FSDataOutputStream fos = fs.create(toPath(fileName));
163             Scanner sc = new Scanner (System.in);
164             String str = "";
165             String sum_str = "";
166             boolean no_error = true;
167             while(no_error)
168             {
169                 str = sc.nextLine();
170                 if(str.compareTo("#END#")==0)
171                 {
172                     fos.write(sum_str.getBytes());
173                     break;
174                 }
175                 else if(sum_str.compareTo("")!=0)
176                 {
177                     sum_str = sum_str + "
";
178                 }
179                 sum_str = sum_str + str;
180             }
181             sc.close();
182         } catch (Exception e) {
183             System.out.println("GrandFather ! Your program have a IOException! ");
184         }
185     }
186     //构造方法
187     public HDFSFileDealer(){
188         super();
189         conf = new Configuration();
190         conf.set("fs.defaultFS","hdfs://localhost:9000");
191         try {
192             fs = FileSystem.get(conf);
193         } catch (IOException e) {
194             System.out.println("GrandFather ! Your program have a IOException! ");
195         }
196     }
197     //主方法
198     @SuppressWarnings("unused")
199     public static void main(String args[]) {
200         int old = 0;
201         HDFSFileDealer hfd = new HDFSFileDealer();
202         
203         String local_file = "test/buyer_favorite1";
204         String hdfs_file = "../../mymapreduce1/in/buyer_favorite1";
205         
206         //hfd.download(local_file, hdfs_file);
207         //hfd.updata(local_file, hdfs_file);
208         //hfd.touchFileWith(hdfs_file);
209         hfd.updata("test/result.txt", "HiveProject/in/result.txt");
210         //hfd.updata("test/order_items1", "mymapreduce5/in/order_items1");
211         //hfd.download("test/downloads","mymapreduce1/in/buyer_favorite1");
212         
213         hfd.free();
214     }
215 }

HDFSFileDealer

　　　　　　　使用的时候，注意传递 HDFS 的根目录不是你要输入的根目录，这个要根据你自己设定的默认目录来。

　　　　【编写完毕】

　　　　2、Scala编程汇总

　　　　　　这个我给大家推荐菜鸟教程去学习。

　　　　3、RDD编程

　　　　　　大家可以参考博客：https://blog.csdn.net/tsy_1222/article/details/96355531

　　四、数据调试篇（参数调优方略）

　　　　数据调试其实就是使用我们的测试数据进行模拟真实数据做测试，我们可以人为的选定一些特殊形式的数据来检查我们的代码健壮性。

查看全文

相关阅读:
关于spring的applicationContext.xml配置文件的ref和value之自我想法
 解决kindeidtor与struts2框架交互WARN OgnlValueStack:68
使用JavaMail发送邮件,465端口开启ssl加密传输
 springData 整合 Rrdis
org.apache.struts2.dispatcher.ng.filter.StrutsPrepareAndExecuteFilter cannot be cast to javax.servlet.Filter
Unable to locate parent package [json-default]
ASP.NET 在请求中检测到包含潜在危险的数据,因为它可能包括 HTML 标记或脚本
 jquery不能实时获取CKEDITOR值的解决方法
 UltraEdit窗口布局重新设置
 C# sqlserver ExecuteNonQuery()方法详解

原文地址：https://www.cnblogs.com/onepersonwholive/p/12218123.html