zoukankan html css js c++ java

centos6安装hadoop1.2.1

1->下载hadoop-1.2.1.tar.gz

tar -zxvf hadoop-1.2.1.tar.gz 解压这里假设解压的文件在 /root/soft

2->创建 hadoop 账户

groupadd hadoop

useradd -g haddop -d /home/hadoop

chown -R hadoop:hadoop /home/hadoop

mv /root/soft/hadoop-1.2.1 /home/hadoop/hadoop-1.2.1

3->hadoop账户免登陆设置

su - hadoop

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

cd .ssh && chmod 710 authorized_keys

4->安装jdk

下载 jdk-8u40-linux-i586.rpm

linux下安装 rpm -ivh jdk-8u40-linux-i586.rpm

安装完成后被安装在/usr/java/jdk1.8.0_40

设置环境变量

/etc/profile添加

JAVA_HOME=/usr/java/jdk1.8.0_40

export PATH=$JAVA_HOME/bin:$PATH

进入hadoop根目录/conf/hadoop-env.sh 添加javahome

export JAVA_HOME=/usr/java/jdk1.8.0_40

5->设置hadoop环境变量

/etc/profile添加

HADOOP_HOME=/home/hadoop/hadoop-1.2.1
PATH=$HADOOP_HOME/bin:$PATH

进入hadoop根目录/conf

修改三个配置文件

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

6->启动hadoop

su - hadoop

start-all.sh

或者依次运行start-dfs.sh start-mapred.sh

启动完成后查看 hdfs文件系统的目录里的文件可以理解为是个ftp服务器

hadoop fs -lsa /

这里运行 hadoop 根目录下的 hadoop-examples-1.2.1.jar 测试程序

hadoop jar hadoop-examples-1.2.1.jar 可以看到这个测试程序有多个程序

[root@localhost hadoop-1.2.1]# hadoop jar hadoop-examples-1.2.1.jar

An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
dbcount: An example job that count the pageview counts from a database.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sleep: A job that sleeps at each map and reduce task.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.

这里测试个wordcount 这个程序是通过mapreduce程序将hdfs某个目录下的文本文件统计他里面出现的所有单词以及单词出现的次数

首先在hdfs文件系统上创建一个文件夹这里的/就是我们定义的

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

当然一般情况下要写成 hadoop fs -mkdir hdfs://localhost:9000/test 下面简便写如下

hadoop fs -mkdir /test

hadoop fs -mkdir /test/input

下面也可以将 /test/input 改成 hdfs://localhost:9000/test/input

[hadoop@localhost hadoop-1.2.1]$ hadoop fs -put /home/hadoop/hadoop-1.2.1/conf/*.xml /test/input
[hadoop@localhost hadoop-1.2.1]$ hadoop fs -lsr /
drwxr-xr-x - hadoop supergroup 0 2015-03-31 14:36 /test
drwxr-xr-x - hadoop supergroup 0 2015-03-31 14:38 /test/input
-rw-r--r-- 1 hadoop supergroup 7457 2015-03-31 14:38 /test/input/capacity-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 294 2015-03-31 14:38 /test/input/core-site.xml
-rw-r--r-- 1 hadoop supergroup 327 2015-03-31 14:38 /test/input/fair-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 4644 2015-03-31 14:38 /test/input/hadoop-policy.xml
-rw-r--r-- 1 hadoop supergroup 274 2015-03-31 14:38 /test/input/hdfs-site.xml
-rw-r--r-- 1 hadoop supergroup 2033 2015-03-31 14:38 /test/input/mapred-queue-acls.xml
-rw-r--r-- 1 hadoop supergroup 285 2015-03-31 14:38 /test/input/mapred-site.xml
drwxr-xr-x - hadoop supergroup 0 2015-03-31 13:21 /tmp
drwxr-xr-x - hadoop supergroup 0 2015-03-31 13:21 /tmp/hadoop-hadoop
drwxr-xr-x - hadoop supergroup 0 2015-03-31 13:59 /tmp/hadoop-hadoop/mapred
drwx------ - hadoop supergroup 0 2015-03-31 13:59 /tmp/hadoop-hadoop/mapred/system
-rw------- 1 hadoop supergroup 4 2015-03-31 13:59 /tmp/hadoop-hadoop/mapred/system/jobtracker.info

运行程序

hadoop jar hadoop-examples-1.2.1.jar wordcount /test/input /test/output

可以通过http://192.168.2.88:50070/ 查看hdfs文件情况

这里看 input为7个文件

Go to parent directory

Name	Type	Size	Replication	Block Size	Modification Time	Permission	Owner	Group
capacity-scheduler.xml	file	7.28 KB	1	64 MB	2015-03-31 14:38	rw-r--r--	hadoop	supergroup
core-site.xml	file	0.29 KB	1	64 MB	2015-03-31 14:38	rw-r--r--	hadoop	supergroup
fair-scheduler.xml	file	0.32 KB	1	64 MB	2015-03-31 14:38	rw-r--r--	hadoop	supergroup
hadoop-policy.xml	file	4.54 KB	1	64 MB	2015-03-31 14:38	rw-r--r--	hadoop	supergroup
hdfs-site.xml	file	0.27 KB	1	64 MB	2015-03-31 14:38	rw-r--r--	hadoop	supergroup
mapred-queue-acls.xml	file	1.99 KB	1	64 MB	2015-03-31 14:38	rw-r--r--	hadoop	supergroup
mapred-site.xml	file	0.28 KB	1	64 MB	2015-03-31 14:38	rw-r--r--	hadoop	supergroup

可以通过http://192.168.2.88:50030/ 查看mapreduce运行情况

User: hadoop
Job Name: word count
Job File: hdfs://localhost:9000/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201503311441_0001/job.xml
Submit Host: localhost.localdomain
Submit Host Address: 127.0.0.1
Job-ACLs: All users are allowed
Job Setup: Successful
Status: Succeeded
Started at: Tue Mar 31 14:45:01 PDT 2015
Finished at: Tue Mar 31 14:45:40 PDT 2015
Finished in: 38sec
Job Cleanup: Successful

Kind

% Complete

Num Tasks

Pending

Running

Complete

Killed

Failed/Killed
Task Attempts

map

100.00%

0 / 0

reduce

100.00%

0 / 0

这里通过这个结果可以理解什么是map 什么是reduce 这里 input目录下可以看到有 7个文件就启动了 7个 task 也就是7个map

就是7个线程单独统计各自文件中的单词及其单词出现的次数

最后再来一次 reduce 这里可以理解为统计 7个线程统计出来的7个结果合并到一起排序后最后所有相同单词统计到一起就是reduce的过程

这里 map 运行的就是 tasktracker 而tasktracker 处理的文件处于 datanode中所以一般运行中tasktracker 和datanode处于同一台机器

而 jobtracker 可以理解为 tasktracker 的启动者创建多少个tasktracker 都由jobtracker 调度

集群安装参考

http://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html

查看全文

相关阅读:
怎么认Destsoon标签条件
 PHP将图片转base64格式函数
 修改Discuz!X系列开启防CC攻击，不影响搜索引擎收录
 discuz x3.2简化的搜索框代码
 让Discuz! X3.2 SEO标题里的“-”支持空格
 javascript的常用操作(二)
Spring MVC中注解的简介
 Spring MVC + Thymeleaf
Maven建立spring-web项目
 Spring @Autowired使用介绍

原文地址：https://www.cnblogs.com/liaomin416100569/p/9331286.html