zoukankan      html  css  js  c++  java
  • hadoop集群搭建


    hadoop集群搭建

    1 集群规划

    三台虚拟机
    操作系统:CentOS7 Minimal
    通过桥接方式联网(NAT和Host-only应该也可以)
    IP地址分别是:
    192.168.1.101
    192.168.1.102
    192.168.1.103

    2 集群基本配置

    修改三台机器的/etc/hosts文件,增加如下内容:

    192.168.1.101 master
    192.168.1.102 slave1
    192.168.1.103 slave2
    

    分别修改/etc/hostname,内容为master/slave1/slave2

    3 配置ssh免密码访问

    在slave1中

    su
    vim /etc/ssh/sshd_config
    StrictModes no
    RSAAuthentication yes
    PubkeyAuthentication yes
    /bin/systemctl restart  sshd.service
    mkdir .ssh
    

    在master中

    ssh-keygen -t dsa
    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    cat ~/.ssh/id_dsa.pub | ssh galaxy@slave1 'cat - >> ~/.ssh/authorized_keys'
    ssh slave1
    

    用同样的方式处理slave2和master(master自己也要能够ssh免密码访问)

    参考资料:
    http://my.oschina.net/u/1169607/blog/175899
    http://segmentfault.com/a/1190000002911599

    4 安装hadoop

    省略

    5 配置hadoop

    hadoop-env.sh

    #export JAVA_HOME=$JAVA_HOME                  //错误,不能这么改
    export JAVA_HOME=/usr/java/jdk1.8.0_45
    

    core-site.xml

    <configuration>
    	<property>
    		<name>fs.default.name</name>
    		<value>hdfs://master:9000</value>
    	</property>
    	<property>
    		<name>hadoop.tmp.dir</name>
    		<value>/tmp</value>
    	</property>
    </configuration>
    

    hdfs-site.xml

    <configuration>
    	<property>
    		<name>dfs.replication</name>
    		<value>2</value>
    	</property>
    </configuration>
    

    mapred-site.xml

    <configuration>
    	<property>
    		<name>mapred.job.tracker</name>
    		<value>master:9001</value>
    	</property>
    </configuration>
    

    masters

    master
    

    slaves

    slave1
    slave2
    

    将配置复制到另外两台机器上

    scp etc/hadoop/* galaxy@slave1:/home/galaxy/hadoop-2.5.1/etc/hadoop/
    scp etc/hadoop/* galaxy@slave2:/home/galaxy/hadoop-2.5.1/etc/hadoop/
    

    6 启动hadoop集群

    格式化namenode

    ./bin/hadoop namenode -format
    出现:15/11/09 19:25:59 INFO common.Storage: Storage directory /tmp/dfs/name has been successfully formatted.
    

    启动hadoop

    ./sbin/start-dfs.sh
    

    通过jps验证是否都正常运行

    [galaxy@master hadoop-2.5.1]$ jps
    5924 ResourceManager
    6918 SecondaryNameNode
    7718 Jps
    6743 NameNode
    
    [galaxy@slave1 ~]$ jps
    6402 Jps
    6345 DataNode
    
    [galaxy@slave2 ~]$ jps
    25552 Jps
    25495 DataNode
    

    7 查看集群状态

    命令行方式

    ./bin/hdfs dfsadmin -report
    Configured Capacity: 0 (0 B)
    Present Capacity: 0 (0 B)
    DFS Remaining: 0 (0 B)
    DFS Used: 0 (0 B)
    DFS Used%: NaN%
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    

    网页方式
    http://192.168.1.101:50070
    注意:需要关闭centos7的防火墙:systemctl stop firewalld

    8 运行测试程序

    创建本地测试文件

    mkdir input
    vim input/f1
    vim input/f2
    

    创建hadoop目录

    ./bin/hadoop fs  -mkdir /tmp
    ./bin/hadoop fs  -mkdir /tmp/input
    ./bin/hadoop fs -ls /
    

    上传测试文件

    ./bin/hadoop fs -put input/ /tmp
    注意:需要关闭所有节点centos7的防火墙:systemctl stop firewalld,否则上传文件会报错
    ./bin/hadoop fs -ls /tmp/input
    -rw-r--r--   2 galaxy supergroup         16 2015-11-11 04:30 /tmp/input/f1
    -rw-r--r--   2 galaxy supergroup         24 2015-11-11 04:30 /tmp/input/f2
    

    运行wordcount

    ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount /tmp/input /output
    

    查看输出结果

    [galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -ls /output
    Found 2 items
    -rw-r--r--   2 galaxy supergroup          0 2015-11-11 04:44 /output/_SUCCESS
    -rw-r--r--   2 galaxy supergroup         31 2015-11-11 04:44 /output/part-r-00000
    [galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -cat /output/*
    bye	2
    hadoop	2
    hello	2
    world	1
    

    Author: galaxy

    Created: 2015-11-11 Wed 18:00

    Emacs 24.5.6 (Org mode 8.2.10)

    Validate

  • 相关阅读:
    因果,稳定,无源,无损系统(1)
    傅里叶变化公式解析(1)
    线性时不变系统(1)
    数字信号常用典型序列(1)
    k均值聚类(1)
    jupyter notebook configtips
    gitlab搭建,结合pycharm和vs2015配置用于开发python和c++
    wordpress网站迁移
    本地电脑通过Navicat连接阿里云的Mysql数据库
    ubuntu安装时系统分区设置
  • 原文地址:https://www.cnblogs.com/galaxy-gao/p/4956742.html
Copyright © 2011-2022 走看看