zoukankan      html  css  js  c++  java
  • 大数据平台环境搭建

    集群组件:centos7+jdk1.8+hadoop-2.6.5+zookeeper3.4.12+hbase1.2.1+hive 2.1.1

    虚拟机搭建的集群

    宿主机IP:192.168.174.1
    提供的网关IP:192.168.174.2
    三台
    192.168.174.101
    192.168.174.102
    192.168.174.103

    安装系统、

    配置静态IP

    修改/etc/sysconfig/network-scripts/ifcfg-ens33(可能名字不一样)

    BOOTPROTO=static

    ONBOOT=yes

    IPADDR=192.168.174.101
    NETMASK=255.255.255.0
    GATEWAY=192.168.174.2
    DNS1=8.8.8.8
    DNS2=8.8.8.4

    重启网络

    /etc/init.d/network restart

    ping www.baidu.com

    连通表示正常

    安装lrzsz

    yum install lrzsz

    判断Linux是32位还是64位

    方法一:getconf LONG_BIT

    方法二:uname -a

    如果是64位机器,会输出x86_64,否则代表该机器是32位的

    安装与卸载Jdk1.8

    去官网下载jdk:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 
    解压到安装目录

    安装指令(用hadoop账号安装) tar -zxvf jdk-8u171-linux-x64.tar.gz -C /usr/java/

    安装完毕之后在/etc/profile文件末尾添加环境变量

    [root@bogon software]# vi /etc/profile
    export JAVA_HOME=/usr/java/jdk1.8.0_171
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=${JAVA_HOME}/bin:$PATH

    使/etc/profile生效(使用hadoop账号)

    [root@bogon jdk1.8.0_101]# source /etc/profile

    检测安装是否成功

      java -version

    安装Hadoop2.6.5

    (1)下载hadoop安装包,放到/home/hadoop目录下
    (2)解压,输入命令,tar -xzvf hadoop-2.6.5.tar.gz
    (3)在/home/hadoop目录下创建数据存放的文件夹,tmp、hdfs、hdfs/data、hdfs/name

    总体思路,准备主从服务器,配置主服务器可以无密码SSH登录从服务器,解压安装JDK,解压安装Hadoop,配置hdfs、mapreduce等主从关系。

    SSH免密码登录,因为Hadoop需要通过SSH登录到各个节点进行操作,我用的是root用户,每台服务器都生成公钥,再合并到authorized_keys
    (1)CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中1行的注释,每台服务器都要设置,
    #PubkeyAuthentication yes

    (2)输入命令,ssh-keygen -t rsa,生成key,都不输入密码,一直回车,/root就会生成.ssh文件夹,每台服务器都要设置,
    (3)合并公钥到authorized_keys文件,在Master服务器,进入/root/.ssh目录,通过SSH命令合并,
    cat id_rsa.pub>> authorized_keys
    ssh root@192.168.174.102 cat ~/.ssh/id_rsa.pub>> authorized_keys
    ssh root@192.168.174.103 cat ~/.ssh/id_rsa.pub>> authorized_keys
    (4)把Master服务器的authorized_keys、known_hosts复制到Slave服务器的/root/.ssh目录
    (5)完成,ssh root@192.168.174.102、ssh root@192.168.174.103就不需要输入密码了

    配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的core-site.xml
     

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://node1.zzy.com:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/tmp/hadoop-${user.name}</value>
        </property>
        <property>
            <name>io.file.buffer.size</name>
            <value>131702</value>
        </property>
    </configuration>

    配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的hdfs-site.xml
      

    <configuration>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/home/hadoop/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/home/hadoop/dfs/data</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>node1.zzy.com:9001</value>
        </property>
        <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
        </property>
    </configuration>

    配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的mapred-site.xml(没有的话,需要重命名那个临时的)
      

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>192.168.174.101:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>192.168.174.101:19888</value>
        </property>
         <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>192.168.174.101:8032</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>192.168.174.101:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>192.168.174.101:8031</value>
        </property>
        <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>192.168.174.101:8033</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>192.168.174.101:8088</value>
        </property>
        <property>
            <name>yarn.nodemanager.resource.memory-mb</name>
            <value>768</value>
        </property>
    </configuration>

    配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下hadoop-env.sh、yarn-env.sh的JAVA_HOME,不设置的话,启动不了,
    export JAVA_HOME=/usr/java/jdk1.8.0_171

    配置/home/hadoop/hadoop-2.6.5/etc/hadoop目录下的slaves,删除默认的localhost,增加2个从节点,

    192.168.174.102
    192.168.174.103

    将配置好的Hadoop复制到各个节点对应位置上,通过scp传送,
    scp -r /home/hadoop 192.168.174.102:/home/
    scp -r /home/hadoop 192.168.174.103:/home/

    在Master服务器启动hadoop,从节点会自动启动,进入/home/hadoop/hadoop-2.6.5目录
    (1)初始化,输入命令,bin/hdfs namenode -format
    (2)全部启动sbin/start-all.sh,也可以分开sbin/start-dfs.sh、sbin/start-yarn.sh
    (3)停止的话,输入命令,sbin/stop-all.sh 也可以分开stop-dfs.sh 、stop-yarn.sh
    (4)输入命令,jps,可以看到相关信息

    Web访问,要先开放端口或者直接关闭防火墙
    (1)输入命令,systemctl stop firewalld.service
    (2)浏览器打开http://192.168.174.101:8088
    (3)浏览器打开http://192.168.174.101:50070

    至此,hadoop安装完成。这只是大数据应用的开始,之后的工作就是,结合自己的情况,编写程序调用Hadoop的接口,发挥hdfs、mapreduce的作用。

    //关闭防火墙
    systemctl stop firewalld
    //禁止开机启动
    systemctl disable firewalld
    //检测状态
    systemctl status firewalld

    参考文章:https://blog.csdn.net/sa14023053/article/details/51953836

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    CentOS7中执行 service iptables start/stop   会报错Failed to start iptables.service: Unit iptables.service failed to load: No such file or directory.

    在CentOS 7或RHEL 7或Fedora中防火墙由firewalld来管理,

    如果要添加范围例外端口 如 1000-2000
    语法命令如下:启用区域端口和协议组合
    firewall-cmd [--zone=<zone>] --add-port=<port>[-<port>]/<protocol> [--timeout=<seconds>]
    此举将启用端口和协议的组合。端口可以是一个单独的端口 <port> 或者是一个端口范围 <port>-<port> 。协议可以是 tcp 或 udp。
    实际命令如下:

    添加

    firewall-cmd --zone=public --add-port=80/tcp --permanent (--permanent永久生效,没有此参数重启后失效)

    firewall-cmd --zone=public --add-port=1000-2000/tcp --permanent 

    重新载入
    firewall-cmd --reload
    查看
    firewall-cmd --zone=public --query-port=80/tcp
    删除
    firewall-cmd --zone=public --remove-port=80/tcp --permanent

    当然你可以还原传统的管理方式。

    执行一下命令:

     
    1. systemctl stop firewalld  
    2. systemctl mask firewalld  


    并且安装iptables-services:

    1. yum install iptables-services  


    设置开机启动: 

    systemctl enable iptables 
    systemctl stop iptables  

    systemctl start iptables  

    systemctl restart iptables  

    systemctl reload iptables  


    保存设置:

     
    1. service iptables save  

    OK,再试一下应该就好使了

     Call From localhost/127.0.0.1 to 192.168.174.101:9000 failed on connection exception: java.net.ConnectException: 拒绝连接

     未开启服务

    参考博客

    https://www.linuxidc.com/Linux/2015-11/124800.htm

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

     zookeeper安装

    ---zoo.cfg配置文件从临时文件重命名来的,里面不能用环境变量

    ---myid是自己创建的,里面的内容就是zoo.cfg里分配的本机的id

    ---只有当至少启动了三个节点之后,该命令(./bin/zkServer.sh status)才会产生结果。否则会显示:zookeeper Error contacting service. It is probably not running错误

    ---启动不了就要查看zookeeper.out日志文件

    scp -r /home/hadoop/zookeeper-3.4.12 192.168.174.102:/home/hadoop/
    scp -r /home/hadoop/zookeeper-3.4.12 192.168.174.103:/home/hadoop/

    systemctl start firewalld
    systemctl stop firewalld
    systemctl status firewalld

    firewall-cmd --zone=public --add-port=2888/tcp --permanent

    firewall-cmd --zone=public --add-port=3888/tcp --permanent

    firewall-cmd --zone=public --add-port=2181/tcp --permanent

    firewall-cmd --reload

    tickTime=2000
    clientPort=2181
    initLimit=5
    syncLimit=2

    dataDir=/home/hadoop/zookeeper-3.4.12/data
    dataLogDir=/home/hadoop/zookeeper-3.4.12/logs


    server.1=node1.zzy.com:3181:4181
    server.2=node2.zzy.com:3181:4181
    server.3=node3.zzy.com:3181:4181

    ssh -v -p 2888 hadoop@192.168.174.102


    ./bin/zkServer.sh start
    ./bin/zkServer.sh stop
    ./bin/zkServer.sh restart
    ./bin/zkServer.sh status


    export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.12/
    export PATH=$ZOOKEEPER_HOME/bin:$PATH
    export PATH

     --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

     HBASE 安装

    ZK_HOME=/home/hadoop/zookeeper-3.4.12
    HBASE_HOME=/home/hadoop/hbase-1.2.1


    hbase-env.sh----

    export JAVA_HOME=/usr/java/jdk1.8.0_171
    export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
    export HBASE_HOME=/home/hadoop/hbase-1.2.1
    export HBASE_CLASSPATH=/home/hadoop/hadoop-2.6.5/etc/hadoop
    export HBASE_PID_DIR=/home/hadoop/hbase/pids
    export HBASE_MANAGES_ZK=false


    hbase-site.xml

    <configuration>

    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://node1:9000/hbase</value>
    <description>The directory shared byregion servers.</description>
    </property>
    <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>12181</value>
    <description>Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect.
    </description>
    </property>
    <property>
    <name>zookeeper.session.timeout</name>
    <value>120000</value>
    </property>
    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>node1,node2,node3</value>
    </property>
    <property>
    <name>hbase.tmp.dir</name>
    <value>/home/hadoop/hbase/tmp</value>
    </property>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>

    </configuration>


    scp -r /home/hadoop/hbase-1.2.1 192.168.174.102:/home/hadoop/
    scp -r /home/hadoop/hbase-1.2.1 192.168.174.103:/home/hadoop/


    scp -r /home/hadoop/hbase 192.168.174.102:/home/hadoop/
    scp -r /home/hadoop/hbase 192.168.174.103:/home/hadoop/

    最终的环境变量为

    export JAVA_HOME=/usr/java/jdk1.8.0_171
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.12
    export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
    export ZK_HOME=/home/hadoop/zookeeper-3.4.12
    export HBASE_HOME=/home/hadoop/hbase-1.2.1
    export PATH=$PATH:${JAVA_HOME}/bin:${HADOOP_HOME}/sbin:${HADOOP_HOME}/bin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin


    后台地址 http://192.168.174.101:16030/

    参考:https://blog.csdn.net/pucao_cug/article/details/72229223

     --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    安装hive 2.1.1

    export HIVE_HOME=/home/hadoop/hive
    export HIVE_CONF_DIR=$HIVE_HOME/conf
    PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin

    hive-site.xml

    <property
    <name>javax.jdo.option.ConnectionDriverName</name
    <value>com.mysql.jdbc.Driver</value>
    </property>

    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://192.168.174.1/hive?createDatabaseIfNotExist=true</value>


    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>

    <name>javax.jdo.option.ConnectionPassword</name><value>*******</value>


    hive-env.sh文件

    export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
    export HIVE_CONF_DIR=/home/hadoop/hive/conf
    export HIVE_AUX_JARS_PATH=/home/hadoop/hive/lib

    对MySQL数据库初始化
    cd $HIVE_HOME/bin
    schematool -initSchema -dbType mysql

    启动命令行  ./hive

    参考文章:https://blog.csdn.net/jssg_tzw/article/details/72354470

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


    scala安装---------

    export SCALA_HOME=/home/hadoop/scala-2.10.4
    export PATH=$PATH:${SCALA_HOME}/bin

    spark安装---------

    export SPARK_HOME=/home/hadoop/spark-1.6.0-bin-hadoop2.6
    export PATH=$PATH:${SPARK_HOME}/bin


    spark-env.sh----

    export SCALA_HOME=/home/hadoop/scala-2.10.4
    export JAVA_HOME=/usr/java/jdk1.8.0_171
    export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export SPARK_MASTER_IP=192.168.170.101
    export SPARK_LOCAL_DIRS=/home/hadoop/spark-1.6.0-bin-hadoop2.6
    export SPARK_WORKER_MEMORY=1g

    scp -r /home/hadoop/scala-2.10.4 192.168.174.102:/home/hadoop/
    scp -r /home/hadoop/scala-2.10.4 192.168.174.103:/home/hadoop/

    scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 192.168.174.102:/home/hadoop/
    scp -r /home/hadoop/spark-1.6.0-bin-hadoop2.6 192.168.174.103:/home/hadoop/

    遇到的问题:提交job到yarn上,一直提示 Application report for application_1530263181961_0002 (state: ACCEPTED)

    思路:

    集群检测到资源不够用,有可能真的不够用,也有可能datanode异常,检测不到。

    1.检测namenode,ResourceManger服务已启动。检测DataNode nodeManger已启动 ,并确定状态正常(running)

    2.提交job参数driver-memory设置不大于500M(最少好像是这个范围),executor-memory设置几十M差不多能跑起来就行

    最终我的脚本是

    spark-submit
    --master yarn-cluster
    --num-executors 1
    --executor-memory 20m
    --executor-cores 1
    --driver-memory 512m
    --class local.test201806.YarnTest
    sparkdemo-1.0-SNAPSHOT.jar

    ---------------------------------------------------------------------------------------------

    Telnet安装

    1. yum install telnet-server          安装telnet服务

    2. yum install telnet.*           安装telnet客户端

  • 相关阅读:
    【转】MYSQL入门学习之四:MYSQL的数据类型
    【转】MYSQL入门学习之三:全文本搜索
    【转】MYSQL入门学习之二:使用正则表达式搜索
    【转】MYSQL入门学习之一:基本操作
    【转】SVN服务器搭建--Subversio与TortoiseSVN的配置安装
    【转】用 SVN Importer 实现 CSVNT 到 SVN 的转换
    【转】解决svn Authorization failed错误
    【转】SVN提示:由于目标机器积极拒绝,无法连接 的解决方法
    【转】成功在AMD主机上用虚拟机安装原版雪豹
    【转】Cookie和Session的区别详解
  • 原文地址:https://www.cnblogs.com/chinaboyzzy/p/9209084.html
Copyright © 2011-2022 走看看