hadoop
标签: ubuntu hdfs API
概述
通过API访问hdfs文件系统,出现错误:WARN util.Shell:Did not find winutils.exe:{}
HADOOP_HOME and hadoop.home.dir are unset. -see https://
代码如下:
package big.data.hdfs;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
/*
* 通过API访问hdfs文件系统
*/
public class TestHDFS {
static {
//注册协处理器工厂,让java程序能够识别hdfs协议
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception {
//定义url地址
String url = "hdfs://s100:8020/user/ubuntu/how.txt";
//URL对象
URL u = new URL(url);
//URL连接
URLConnection conn = u.openConnection();
//打开输入流
InputStream is = conn.getInputStream();
//输出流
FileOutputStream fos = new FileOutputStream("d:/hello.txt");
//对拷贝
byte[] buf = new byte[1024];
int len = -1;
while ( (len = is.read(buf)) != -1) {
fos.write(buf, 0, len);
}
is.close();
fos.close();
System.out.println("over");
}
}
错误原因:需要在Windows电脑上配置一下hadoop,配置过程如下:
配置hadoop
配置环境变量,在path中添加路径和HADOOP_HOME,path中添加 D:hadoop-2.8.5hadoop-2.8.5和 D:hadoop-2.8.5hadoop-2.8.5sbin , HADOOP_HOME 也设为D:hadoop-2.8.5hadoop-2.8.5。
(我设为D:hadoop-2.8.5hadoop-2.8.5in 反而会报错,不知道问什么)
配置hadoop文件
所涉及到的配置都在 hadoopetchadoophadoop 目录下,均使用记事本打开
注意:JDK的环境变量不要有空格
- 文件1: D:hadoop-2.8.5hadoop-2.8.5etchadoophadoop-env.cmd
set JAVA_HOME=C:ProgramFilesJavajdk1.8.0_181
- 文件2:D:hadoop-2.8.5hadoop-2.8.5etchadoopcore-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- 文件3: D:hadoop-2.8.5hadoop-2.8.5etchadoophdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hadoop/data/dfs/datanode</value>
</property>
</configuration>
- 文件4: D:hadoop-2.8.5hadoop-2.8.5etchadoopmapred-site.xml ,mapred-site.xml是复制mapred-site.xml.template,去掉template
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 文件5: D:hadoop-2.8.5hadoop-2.8.5etchadoopyarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
启动hadoop
在cmd控制台进入D:hadoop-2.8.5hadoop-2.8.5sbin> 然后
hadoop namenode -format //格式化hdfs
start-all.cmd
把下载的winutils.exe 放在D:hadoop-2.8.5hadoop-2.8.5in 目录下, 注意版本对应。
重启eclipse,运行,仍然有个警告:貌似是32位64位的问题,不重要了。
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable