大数据集群实验环境搭建
一、1.x环境搭建
1、虚拟机、安装jdk
2、免密码设定:
(1)生成自己的公钥和私钥。
ssh-keygen -t -rsa
(2)将自己的私钥拷贝到需要免密码的服务器的.ssh目录下,重新命名为authorized_keys。
scp ./id_rsa.pub rowen@192.168.128.133:/home/rowen/.ssh/authorized_keys
假如有多台机器都需要对h1进行免密码,则需要将多台机器的公钥都拷贝到h1中的authorized_keys文件中。
3、下载hadoop1.x
4、修改配置文件
(1)conf/hadoop-env.sh
修改java_home /home/rowen/soft/jdk1.7.0_80
(2)core-site.xml
<configuration>
<property>
<!--名称节点 -->
<name>fs.default.name</name>
<value>hdfs://backup01:9000</value>
</property>
<property>
<!--指定hadoop临时目录 防止重启集群的tmp被情况 -->
<name>hadoop.tmp.dir</name>
<value>/home/rowen/soft/hadoop-1.1.2/tmp</value>
</property>
</configuration>
(3)hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
(4)mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>backup01:9001</value>
</property>
</configuration>
(5)masters
backup01
(6)slaves
backup02
5、换root 改/etc/hosts
192.168.128.131 backup01
192.168.128.133 backup02
6、将配置复制给其他节点
scp -r ./hadoop-1.1.2 rowen@192.168.128.133:/home/rowen/soft
7、修改其他节点host文件 和(5)一致
8、格式化名称节点
bin/hadoop namenode -format
9、启动hadoop:./bin/start-all.sh
名称节点:
NameNode
SecondaryNameNode
JobTracker
数据节点:
TaskTracker
DataNode
证明启动成功!
二、2.x安装
2.x和1.x最大的区别是配置文件的不同。
1、下载2.2.0
2、配置
(1)core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://h1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/rowen/Downloads/hadoop-2.2.0/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
(2)hdfs.site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>h1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/rowen/Downloads/hadoop-2.2.0/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/rowen/Downloads/hadoop-2.2.0/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
(3) mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>h1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>h1:19888</value>
</property>
</configuration>
(4)yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>h1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>h1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>h1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>h1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>h1:8088</value>
</property>
</configuration>
3、64位操作系统解决办法
2.x的hadoop下,native里面的包都是32位的,如果操作系统是32位的,经过以上配置,都能启动成功。如果是操作系统是64位的,就需要自己编译hadoop或者从网上下载别人已经编译好的native,并替换 。