注意:hadoop有两种运行模式,安全模式和非安全模式。安装模式是以指定在健壮的,基于身份验证上运行的,本文无需运行在非安全模式下,可以直接使用root用户。
本文用户是基于root用户来运行的
一、网络配置
打开终端输入ifconfig,找到本机ip,192.168.209.139
/etc/hosts存放的是域名与ip的对应关系,
gedit /etc/hosts
192.168.209.139 master
192.168.209.140 slave1
192.168.209.141 slave2
192.168.209.142 slave3
分别修改机器名称:
gedit /etc/hostname 分别修改为master、slave1、slave2、slave3
二、打通机器
各个节点安装ssh
sudo apt-get install openssh-server
启动服务 sudo /etc/init.d/ssh start
查看服务 ps -e|grep ssh
主节点操作:
cd /root/.ssh/
ssh-keygen -t rsa -P "" -P指定端口号
cat id_rsa.pub > ~/.ssh/authorized_keys #把前面文件的内容追加到后面的文件里面,没有后面的文件就创建
scp ~/.ssh/authorized_keys slave1:~/.ssh/authourized_keys #拷贝到子节点
scp ~/.ssh/authorized_keys slave2:~/.ssh/authourized_keys
scp ~/.ssh/authorized_keys slave3:~/.ssh/authourized_keys
ssh master 登录测试无密码
三、安装jdk
官网下载jdk1.7版本
tar -zxvf jdk-7u7-linux-i586.tar.gz
mv jdk1.7.0_07 /usr/local/jdk7
设置环境变量 gedit /etc/profile
#set java envirenment
export JAVA_HOME=/usr/local/jdk7 #解压的路径
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
# set hadoop
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
#export PATH="$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH"
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH
使环境变量生效 source /etc/profile
四、安装,配置hadoop
官网下载的是32位包,主要自己编译成64位,所以最好网上找64位以编译好的包
tar -zxvf hadoop-2.4.0-64bit.tar.gz
mv hadoop-2.4.0 /usr/local/hadoop
配置文件在cd /usr/local/hadoop/etc/hadoop/ 目录下
1、在yarn-env.sh 和hadoop-env.sh文件中加上jdk路径
gedit yarn-env.sh
gedit hadoop-env.sh
export JAVA_HOME=/usr/local/jdk7
2、gedit core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
<final>true</final>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp/hadoop-${user.name}</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
3、gedit hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
4、gedit mapred-site.xml.template
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Execution framework set to Hadoop YARN.</description>
</property>
</configuration>
5、gedit yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>master:9001</value>
<description>The address of the applications manager interface in the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
<description>The address of the applications manager interface in the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
<description>The address of the scheduler interface,in order for the RM to obtain the resource from scheduler</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:18025</value>
<description>The address of the resource tracker interface for the nodeManagers</description>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:18035</value>
<description>The address for admin manager</description>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:18088</value>
<description>The address of the RM web application.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
5、gedit slaves
在该文件中添加
slave1
slave2
slave3
6、将hadoop文件夹和java 拷贝到另外三台,同时把环境变量里面的配置也一并进行拷贝
7、主节点启动hadoop
[初始化]
cd $HADOOP_HOME/bin
./hadoop namenode -format
启动
cd $HADOOP_HOME
sbin/start-all.sh
停止
sbin/stop-all.sh
jps 查看进程
hadoop dfsadmin -report 查看集群信息