zoukankan      html  css  js  c++  java
  • [kylin] 部署kylin服务

    官网:

    http://kylin.apache.org/

    社区:

    https://github.com/KylinOLAP/Kylin/issues

    http://apache-kylin.74782.x6.nabble.com/ 

    源码:

    https://github.com/apache/kylin

    博客:

     Apache Kylin的快速数据立方体算法

    Apache Kylin (v1.5.0) 发布,全新设计的新一代

    Apache基金会宣布Apache Kylin成为顶级项目

    逐层(By Level)算法 VS 逐块(By Split) 算法

    Kylin正式发布:面向大数据的终极OLAP引擎方案

    Apache Kylin在百度地图的实践

    京东王晓雨:Apache Kylin在云海的实践

    一、工具准备

    zookeeper3.4.6 (hadoop、hbase 管理工具)
    Hadoop.2.7.1
    Hbase1.1.4
    Kylin1.5.0-HBase1.1.3
    Jdk1.7.80
    Hive 2.0.0

    二、虚拟主机

    192.168.200.165 master1
    192.168.200.166 master2
    192.168.200.167 slave1
    192.168.200.168 slave2

    三、安装mysql

    查看是否安装了mysqlmaster1

    [root@master1 ~]# ps -aux | grep mysql
    Mysql 3632 0.0 0.0 115348 1648?SsApr01 0:00 /bin/sh /wdcloud/app/mysql/bin/mysqld_safe
    mysql 4519 0.5 19.8 13895940 1591664 ? Sl Apr01 29:55
    /wdcloud/app/mysql/bin/mysqld
    --basedir=/wdcloud/app/mysql
    --datadir=/wdcloud/data/mysql/data
    --plugin-dir=/wdcloud/app/mysql/lib/mysql/plugin
    --log-error=/wdcloud/data/mysql/data/mysql-error.log
    --open-files-limit=20000
    --pid-file=/wdcloud/data/mysql/data/localhost.localdomain.pid 
    --socket=/tmp/mysql.sock
    --port=3306

    查看mysql版本

    [root@master1 ~]# mysql --version
    mysql  Ver 14.14 Distrib 5.6.29-76.2, for Linux (x86_64) using  6.2

    登录mysql

    [root@master1 ~]# mysql -uroot -p
    Enter password:

    四、安装jdk

    查看安装版本

    [root@master1 ~]# java -version
    java version "1.7.0_80"
    Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
    Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

    查看安装位置

    [root@master1 ~]# which java
    /jdk1.7.0_80/bin/java

    五、安装zookeeper

    1.解压缩zookeeper到根目录下,进入目录,创建文件夹data、datalog、logs,配置环境变量

    export ZOOKEEPER_HOME=/zookeeper-3.4.6
    export PATH=$PATH:$ZOOKEEPER_HOME/bin

    2.进入conf文件夹,复制zoo_sample.cfg 为zoo.cfg

    3.修改zoo.cfg,增加红色内容,保存退出:

    tickTime=2000
    initLimit=10
    syncLimit=5
    dataDir=/zookeeper-3.4.6/data
    dataLogDir=/zookeeper-3.4.6/datalog
    clientPort=2181
    server.0=master1:2888:3888
    server.1=master2:2888:3888
    server.2=slave1:2888:3888
    server.3=slave2:2888:3888

    修改conf下的log4j.properties文件,配置log文件生成位置

    zookeeper.log.dir=/zookeeper-3.4.6/logs
    zookeeper.log.file=zookeeper.log
    zookeeper.tracelog.dir=/zookeeper-3.4.6/logs
    zookeeper.tracelog.file=zookeeper_trace.log

    4.进入data文件夹创建文件“myid”,添加内容 0 保存退出

    5.分发zookeeper文件夹到各个虚拟主机上的根目录上

    scp –r /zookeeper-3.4.6 hadoop@master2:/
    scp –r /zookeeper-3.4.6 hadoop@slave1:/
    scp –r /zookeeper-3.4.6 hadoop@ slave2:/

    6.修改每台主机上的myid ,按照顺序 master1 的myid 为0 master2 的myid 1 以此类推。

    7.启动zookeeper

    分别进入各个虚拟主机的zookeeper目录下启动zk服务

    bin/zkServer.sh start

    8.分别查询zookeeper的状态

    bin/zkServer.sh status
    [hadoop@master1 ~]$ zkServer.sh status
    JMX enabled by default
    Using config: /zookeeper-3.4.6/bin/../conf/zoo.cfg
    Mode: follower

    Leader 是 zookeeper 主机启动了

    Follower是zookeeper 从机启动了

     9.停止zookeeper

    Bin/zkServer stop

    六、hadoop 高可用部署

    1.解压缩hadoop2.7.1到hadoop家目录下,进入目录,创建文件夹tmp,hdfs/name,hdfs/data

    2.进入~/hadoop/etc/hadoop,该文件夹包含了hadoop的大部分配置文件。

    3.修改hadoop各配置文件如下:

    core-site.xml

    <configuration>
             <property>
                    <name>fs.default.name</name>
                    <value>hdfs://master1:9000</value>
                    <final>true</final>
            </property>
            <property>
                    <name>hadoop.tmp.dir</name>
                    <value>/home/hadoop/hadoop/tmp</value>
                    <description>A base for other tempory directories</description>
            </property>
            <property>
                    <name>io.file.buffer.size</name>
                    <value>131702</value>
             </property>
            <property>
                     <name>fs.checkpoint.period</name>
                     <value>3600</value>
                     <description>多长时间记录一次hdfs的镜像,默认一小时</description>
            </property>
            <property>
                     <name>fs.checkpoint.size</name>
                     <value>67108864</value>
                     <description>一次记录多大的size,默认64M</description>
            </property>
    </configuration>

    hadoop.env.sh

    #设置JAVA_HOME
    export JAVA_HOME=/jdk1.7.0_80
    
    #设置HADOOP_CONF_DIR
    export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

    hbase-site.xml

    <configuration>
             <property>
                       <name>hbase.rootdir</name>
                       <value>hdfs://master1:9000/hbase</value>
             </property>
             <property>
                       <name>hbase.cluster.distributed</name>
                       <value>true</value>
             </property>
             <property>
                       <name>hbase.zookeeper.quorum</name>
                       <value>master1,master2,slave1, slave2</value>
             </property>
             <property>
                       <name>hbase.zookeeper.property.dataDir</name>
                       <value> /zookeeper-3.4.6 /data</value>
             </property>
             <property>
                       <name>hbase.zookeeper.property.clientPort</name>
                       <value>2181</value>
             </property>
             <property>
                     <name>hbase.coprocessor.user.region.classes</name>
                     <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
             </property>
             <property>
                       <name>hbase.regionserver.wal.codec</name>
                       <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
             </property>
             <property>
                       <name>hbase.master.loadbalancer.class</name>
                     <value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value>
             </property>
             <property>
                       <name>hbase.coprocessor.master.classes</name>
                       <value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value>
             </property>
             <property>
                       <name>phoenix.query.maxServerCacheBytes</name>
                       <value>1073741824</value>
             </property>
             <property>
                       <name>hbase.client.scanner.caching</name>
                       <value>5000</value>
                       <description>HBase客户端扫描缓存,对查询性能有很大帮助</description>
             </property>
             <property>
                       <name>hbase.rpc.timeout</name>
                       <value>360000000</value>
             </property>
    </configuration>

    hdfs-site.xml 

    <configuration>
            <property>
                    <name>dfs.namenode.name.dir</name>
                    <value>/home/hadoop/hadoop/hdfs/name</value>
            </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/home/hadoop/hadoop/hdfs/data</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>4</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>master2:9001</value>
        </property>
        <property>
             <name>dfs.webhdfs.enabled</name>
            <value>true</value>
        </property>
        <property>
             <name>dfs.client.read.shortcircuit</name>
             <value>false</value>
        </property>
    </configuration>

    mapred-site.xml 

    <configuration>
      <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
        </property>
      <property>
         <name>mapreduce.jobtracker.http.address</name>
          <value>NameNode:50030</value>
      </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>master1:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>master1:19888</value>
        </property>
             <property>
            <name>mapred.compress.map.output</name>
            <value>true</value>
             </property>
    </configuration>

    新增masters文件,以部署高可用的hadoop,将master2作为Secondary Name Node 

    masters

    master2

    slaves 

    master1
    master2
    slave1
    slave2

    yarn-env.sh 

    export YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}"
    export JAVA_HOME=/jdk1.7.0_80
    JAVA=$JAVA_HOME/bin/java
    JAVA_HEAP_MAX=-Xmx4096m

    yarn-site.xml 

    <configuration>
             <property>
                       <name>yarn.resourcemanager.zk-address</name>
                      <value>master1:2181,master2:2181,slave1:2181,slave2:2181</value>
             </property>
             <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
           <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>master1:8032</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>master1:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>master1:8031</value>
        </property>
        <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>master1:8033</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>master1:8088</value>
        </property>
        <property>
            <name>yarn.nodemanager.resource.memory-mb</name>
            <value>2048</value>
             </property>
    </configuration>

    4.配置hadoop环境变量

    export HADOOP_HOME=/home/hadoop/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin


    5.查看hadoop是否安装配置成功
     

    [hadoop@master1 ~]$ hadoop version
    Hadoop 2.7.1
    Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git
    -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
    Compiled by jenkins on 2015-06-29T06:04Z
    Compiled with protoc 2.5.0
    From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
    This command was run using
    /home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar

    6.分发hadoop文件夹到各个虚拟主机上的hadoop家目录上

    scp -r  ~/hadoop/ hadoop@master2:~
    scp -r  ~/hadoop/ hadoop@slave1:~
    scp -r  ~/hadoop/ hadoop@slave2:~

    7.重新格式化hdfs系统 

    如果集群刚配置从没启动过,直接执行格式化操作。

    如果集群已经格式化了后启动过,则先执行删除旧数据的操作后,在执行格式化操作。

    1)删除旧数据

    在 hdfs-ste.xml 配置了

    dfs.name.dir = /home/hadoop/hdfs/name (namenode上存储hdfs名字空间元数据)

    dfs.data.dir = /home/hadoop/hdfs/data (datanode上数据块的物理存储位置)

    在core-site.xml中配置了

    hadoop.tmp.dir = /home/hadoop/hadoop/tmp(namenode上本地的hadoop临时文件夹)

    将各个集群节点这三个文件夹下面的文件和目录全部删除

    2)执行格式化命令

    hadoop namenode -format

    3)格式化日志

    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.
    16/04/05 02:02:28 INFO namenode.NameNode: STARTUP_MSG:
    
    /***********************************************************
    STARTUP_MSG:   Starting NameNode
    STARTUP_MSG:   host = master1/192.168.200.165
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 2.7.1
    STARTUP_MSG:   classpath =
    /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/home/hadoop/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/home/hadoop/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/home/hadoop/hadoop/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/junit-4.11.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/home/hadoop/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/home/hadoop/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.7.1-tests.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-nfs-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.1-tests.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1-tests.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.1.jar:/home/hadoop/hadoop/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop/contrib/capacity-scheduler/*.jar
    
    STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git
    
    -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a; compiled by 'jenkins' on 2015-06-29T06:04Z
    
    STARTUP_MSG:   java = 1.7.0_80
    
    ************************************************************/
    
    16/04/05 02:02:28 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
    16/04/05 02:02:28 INFO namenode.NameNode: createNameNode [-format]
    Formatting using clusterid: CID-070fc765-1b22-4453-83ba-7635ea906e1d
    16/04/05 02:02:29 INFO namenode.FSNamesystem: No KeyProvider found.
    16/04/05 02:02:29 INFO namenode.FSNamesystem: fsLock is fair:true
    16/04/05 02:02:29 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
    16/04/05 02:02:29 INFO blockmanagement.DatanodeManager:
    dfs.namenode.datanode.registration.ip-hostname-check=true
    16/04/05 02:02:29 INFO blockmanagement.BlockManager:
    dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
    16/04/05 02:02:29 INFO blockmanagement.BlockManager: The block deletion will start around 2016 Apr 05 02:02:29
    16/04/05 02:02:29 INFO util.GSet: Computing capacity for map BlocksMap
    16/04/05 02:02:29 INFO util.GSet: VM type= 64-bit
    16/04/05 02:02:29 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
    16/04/05 02:02:29 INFO util.GSet: capacity = 2^21 = 2097152 entries
    16/04/05 02:02:30 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
    16/04/05 02:02:30 INFO blockmanagement.BlockManager: defaultReplication= 3
    16/04/05 02:02:30 INFO blockmanagement.BlockManager: maxReplication= 512
    16/04/05 02:02:30 INFO blockmanagement.BlockManager: minReplication= 1
    16/04/05 02:02:30 INFO blockmanagement.BlockManager: maxReplicationStreams= 2
    16/04/05 02:02:30 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks= false
    16/04/05 02:02:30 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
    16/04/05 02:02:30 INFO blockmanagement.BlockManager: encryptDataTransfer= false
    16/04/05 02:02:30 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
    16/04/05 02:02:30 INFO namenode.FSNamesystem: fsOwner= hadoop (auth:SIMPLE)
    16/04/05 02:02:30 INFO namenode.FSNamesystem: supergroup= supergroup
    16/04/05 02:02:30 INFO namenode.FSNamesystem: isPermissionEnabled = true
    16/04/05 02:02:30 INFO namenode.FSNamesystem: HA Enabled: false
    16/04/05 02:02:30 INFO namenode.FSNamesystem: Append Enabled: true
    16/04/05 02:02:30 INFO util.GSet: Computing capacity for map INodeMap
    16/04/05 02:02:30 INFO util.GSet: VM type= 64-bit
    16/04/05 02:02:30 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
    16/04/05 02:02:30 INFO util.GSet: capacity= 2^20 = 1048576 entries
    16/04/05 02:02:30 INFO namenode.FSDirectory: ACLs enabled? false
    16/04/05 02:02:30 INFO namenode.FSDirectory: XAttrs enabled? true
    16/04/05 02:02:30 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
    16/04/05 02:02:30 INFO namenode.NameNode: Caching file names occuring more than 10 times
    16/04/05 02:02:30 INFO util.GSet: Computing capacity for map cachedBlocks
    16/04/05 02:02:30 INFO util.GSet: VM type= 64-bit
    16/04/05 02:02:30 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
    16/04/05 02:02:30 INFO util.GSet: capacity= 2^18 = 262144 entries
    16/04/05 02:02:30 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
    16/04/05 02:02:30 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
    16/04/05 02:02:30 INFOnamenode.FSNamesystem: dfs.namenode.safemode.extension= 30000
    16/04/05 02:02:30 INFO metrics.TopMetrics: NNTop conf:
    dfs.namenode.top.window.num.buckets = 10
    16/04/05 02:02:30 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
    16/04/05 02:02:30 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
    16/04/05 02:02:30 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
    16/04/05 02:02:30 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
    16/04/05 02:02:30 INFO util.GSet: Computing capacity for map NameNodeRetryCache
    16/04/05 02:02:30 INFO util.GSet: VM type= 64-bit
    16/04/05 02:02:30 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
    16/04/05 02:02:30 INFO util.GSet: capacity= 2^15 = 32768 entries
    16/04/05 02:02:30 INFO namenode.FSImage: Allocated new BlockPoolId: BP-464058956-192.168.200.165-1459836150356
    16/04/05 02:02:30 INFO common.Storage: Storage directory
     /home/hadoop/hadoop/hdfs/name has been successfully formatted.
    16/04/05 02:02:30 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    16/04/05 02:02:30 INFO util.ExitUtil: Exiting with status 0
    16/04/05 02:02:30 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at master1/192.168.200.165
    
    ************************************************************/


    8.启停hadoop
     

    #启动hadoop集群

    注意事项:

    1)各个集群节点之间能免密码登录彼此

    2)确保各节点配置的myid和zoo.cfg里的一致,并先启动zookeeper集群

    启动命令:sbin/start-all.sh 或者依次执行 start-dfs.sh 和 start-yarn.sh

    [hadoop@master1 conf]$ start-all.sh
    Starting namenodes on [master1]
    master1: starting namenode, logging to
    /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-master1.out
    slave2: starting datanode, logging to
    /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-slave2.out
    master2: starting datanode,
    logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-master2.out
    master1: starting datanode, logging to
    /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-master1.out
    slave1: starting datanode, logging to
    /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-slave1.out
    Starting secondary namenodes [master2]
    master2: starting secondarynamenode, logging to
    /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-master2.out
    starting yarn daemons
    starting resourcemanager, logging to
    /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-master1.out
    slave2: starting nodemanager, logging to
    /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
    master2: starting nodemanager, logging to
    /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-master2.out
    master1: starting nodemanager, logging to
    /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-master1.out
    slave1: starting nodemanager, logging to
    /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-slave1.out

    查看相关守护进程 

    NAMENODE(master1)

    [hadoop@master1 logs]$ jps
    12892 NameNode
    13003 DataNode
    13295 ResourceManager
    13408 NodeManager
    13826 QuorumPeerMain

    SECONDARY NAMENODE(master2) 

    [hadoop@master2 ~]$ jps
    10162 SecondaryNameNode
    10052 DataNode
    10245 NodeManager
    4045 QuorumPeerMain

    DATANODE(slave1/slave2) 

    [hadoop@slave1 ~]$ jps
    13902 NodeManager
    13789 DataNode
    9331 QuorumPeerMain
    
     
    [hadoop@slave2 ~]$ jps
    13697 QuorumPeerMain
    18324 DataNode
    18440 NodeManager

    #停止hadoop集群 

    sbin /stop-all.sh

    启动成功后可以访问web控制台查看集群信息:

    NAMENODE:http://192.168.200.165:50070/

    SECONDARY NAMENODE:http://192.168.200.166:9001

    Nodes Of Cluster(YARN作业管理界面): http://192.168.200.166:8088

    9.启动jobhistoryserver

    因为kylin中需要MapReduce任务调度,所以需要启动jobhistoryserver

    [hadoop@master1 logs]$ mr-jobhistory-daemon.sh start historyserver

    查看jobhistoryserver守护进程 

    [hadoop@master1 conf]$ jps
    24419 JobHistoryServer


    七、hbase 部署
     

    1.解压缩hbase-1.1.4到hadoop家目录下,进入目录hbase

    2.进入conf文件夹

    3.修改hbase各配置文件如下:

    hbase-env.sh

    export JAVA_HOME=/jdk1.7.0_80

    regionservers 

    master1
    master2
    slave1
    slave2

    hbase-site.xml(已优化) 

    <configuration>
            <property>
                    <name>hbase.rootdir</name>
                    <value>hdfs://master1:9000/hbase</value>
            </property>
            <property>
                    <name>hbase.cluster.distributed</name>
                    <value>true</value>
            </property>
            <property>
                    <name>hbase.zookeeper.quorum</name>
                    <value>master1,master2,slave1,slave2</value>
            </property>
            <property>
                    <name>hbase.zookeeper.property.dataDir</name>
                    <value>/zookeeper-3.4.6/data</value>
            </property>
            <property>
                    <name>hbase.zookeeper.property.clientPort</name>
                    <value>2181</value>
            </property>
            <property>
                    <name>hbase.master.info.bindAddress</name>
                    <value>master1</value>
            </property>
            <property>
                    <name>hbase.master.info.port</name>
                    <value>60010</value>
            </property>
            <property>
                    <name>hbase.master.maxclockskew</name>
                    <value>200000</value>
                    <description>Time difference of regionserver from master</description>
            </property>
            <property>
                    <name>hbase.coprocessor.user.region.classes</name>
              <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
            </property>
            <property>
                    <name>hbase.regionserver.wal.codec</name>
             <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
            </property>
            <property>
                    <name>hbase.master.loadbalancer.class</name>
                 <value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value>
            </property>
            <property>
                    <name>hbase.coprocessor.master.classes</name>
                 <value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value>
            </property>
            <property>
                    <name>phoenix.query.maxServerCacheBytes</name>
                    <value>1073741824</value>
            </property>
            <property>
                    <name>phoenix.query.maxGlobalMemoryPercentage</name>
                    <value>70</value>
                    </property>
            <property>
                    <name>hbase.client.scanner.caching</name>
                    <value>5000</value>
                    <description>HBase客户端扫描缓存,对查询性能有很大帮助</description>
            </property>
            <property>
                    <name>hbase.rpc.timeout</name>
                    <value>360000000</value>
            </property>
            <property>
                    <name>zookeeper.session.timeout</name>
                    <value>60000</value>
                    <description>zk超时时间</description>
            </property>
            <property>
                    <name>hbase.regionserver.handler.count</name>
                    <value>50</value>
                    <description>用户表接受外来请求的线程数</description>
            </property>
            <property>
                    <name>hbase.hregion.max.filesize</name>
                    <value>107374182400</value>
                   <description>单个ColumnFamily的region大小,若按照ConstantSizeRegionSplitPolicy策略,超过设置的该值则自动split>(100G)</description>
            </property>
            <property>
                    <name>perf.hfile.block.cache.size</name>
                    <value>0.2</value>
                    <description>设置读写平衡</description>
            </property>
            <property>
                    <name>hbase.regionserver.global.memstore.size</name>
                    <value>0.3</value>
                    <description>RegionServer进程block进行flush触发条件:该节点上所有region的memstore之和达到upperLimit*heapsize</description>
            </property>
            <property>
                    <name>hbase.regionserver.global.memstore.lowerLimit</name>
                    <value>0.3</value>
                    <description>RegionServer进程触发flush的一个条件:该节点上所有region的memstore之和达到lowerLimit*heapsize</description>
            </property>
            <property>
                    <name>hbase.zookeeper.property.tickTime</name>
                    <value>6000</value>
                    <description>Client端与zk发送心跳的时间间隔(6秒)</description>
            </property>
            <property>
                    <name>hbase.hstore.blockingStoreFiles</name>
                    <value>10</value>
                    <description>设置读写平衡</description>
            </property>
            <property>
                    <name>hbase.hstore.blockingWaitTime</name>
                    <value>90000</value>
                    <description>block的等待时间(90s)</description>
            </property>
            <property>
                    <name>hbase.hregion.memstore.flush.size</name>
                    <value>104857600</value>
                    <description>memstore大小,当达到该值则会flush到外存设备(100M)</description>
            </property>
            <property>
                    <name>hbase.hregion.memstore.mslab.enabled</name>
                    <value>true</value>
                    <description>是否开启mslab方案,减少因内存碎片导致的Full GC,提高整体性能</description>
            </property>
            <property>
                    <name>hbase.regionserver.region.split.policy</name>
                    <value>org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy</value>
                    <description>split操作默认的策略</description>
            </property>
            <property>
                    <name>hbase.client.write.buffer</name>
                    <value>8388608</value>
                    <description>客户端写buffer,设置autoFlush为false时,当客户端写满buffer才flush(8m)</description>
            </property>
            <property>
                    <name>hbase.hregion.memstore.block.multiplier</name>
                    <value>4</value>
                    <description>如果memstores超过了flushsize的multiplier倍则会阻塞客户端的写</description>
            </property>
            <property>
                    <name>hbase.regionserver.regionSplitLimit</name>
                    <value>150</value>
                    <description>单台RegionServer上region数上限</description>
            </property>
            <property>
                    <name>hbase.regionserver.maxlogs</name>
                    <value>16</value>
                    <description>如果memstores超过了flushsize的multiplier倍则会阻塞客户端的写</description>
            </property>
    </configuration>

    4.配置环境变量,优化内存 

    export HBASE_HOME=/home/hadoop/hbase
    
    export PATH=$PATH:$HBASE_HOME/bin
    
    export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xms1536m -Xmx2048m -Xmn1024m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
    -XX:PermSize=512m -XX:MaxPermSize=512m"
    
    export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS
    -Xms1536m -Xmx2048m -Xmn1024m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
    -XX:CMSInitiatingOccupancyFraction=70 -XX:PermSize=512m -XX:MaxPermSize=512m"
    
    export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xms1024m -Xmx2048m"

    5.分发hbase文件夹到各个虚拟主机上的hadoop家目录上

    scp -r  ~/hbase/ hadoop@master2:~
    scp -r  ~/hbase/ hadoop@slave1:~
    scp -r  ~/hbase/ hadoop@slave2:~

    6.启停

    bin/start-hbase.sh
    bin/stop-hbase.sh

    启动成功信息 

    [hadoop@master1 hadoop]$ start-hbase.sh
    master1: starting zookeeper, logging to
    /home/hadoop/hbase/bin/../logs/hbase-hadoop-zookeeper-master1.out
    slave2: starting zookeeper, logging to
    /home/hadoop/hbase/bin/../logs/hbase-hadoop-zookeeper-slave2.out
    master2: starting zookeeper, logging to
    /home/hadoop/hbase/bin/../logs/hbase-hadoop-zookeeper-master2.out
    slave1: starting zookeeper, logging to
    /home/hadoop/hbase/bin/../logs/hbase-hadoop-zookeeper-slave1.out
    starting master, logging to /home/hadoop/hbase/logs/hbase-hadoop-master-master1.out
    slave2: starting regionserver, logging to
    /home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-slave2.out
    master1: starting regionserver, logging to
    /home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-master1.out
    master2: starting regionserver, logging to
    /home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-master2.out
    slave1: starting regionserver, logging to
    /home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-slave1.out

    查看相关守护进程 

    HMASTER(master1)

    [hadoop@master1 hadoop]$ jps
    14230 HMaster
    14379 HRegionServer

    HREGIONSERVER(master2、slave1、slave2) 

    [hadoop@ master2 ~]$ jps
    10574 HRegionServer
    
    [hadoop@slave1 ~]$ jps
    14230 HRegionServer
    
    [hadoop@slave2 ~]$ jps
    18753 HRegionServer

    出现的问题:集群节点的时间不同步

    org.apache.hadoop.hbase.ClockOutOfSyncException

    设置时间同步

    #yum install ntpdate

    # ntpdate 0.asia.pool.ntp.org

    #rm -rf /etc/localtime
    #ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

    查看时间

    date +%Y-%m-%d-%H:%M:%S

    八、HIVE部署

    仅在master1上部署即可

    1. 解压缩hive-2.0.0到hadoop家目录下,进入目录hive
    1. 进入conf文件夹
    1. 在mysql上创建元数据库hive

    在master1上创建数据库hive(编码选latin,如果不选择latin,会出现问题)

     

    #为hive数据库授权

    grant all on hive.* to 'root'@'%' IDENTIFIED BY 'weidong' with grant option;
    
    flush privilege;

    #设置mysql数据库为任意IP可连接 

    update user set host='%' where host='localhost';

    4.修改hive各配置文件如下: 

    hive-site.xml

    <configuration>
             <property>
                    <name>javax.jdo.option.ConnectionURL</name>
                   <value>jdbc:mysql://master1:3306/hive?createDatabaseIfNotExist=true</value>
             </property>
             <property>
                      <name>javax.jdo.option.ConnectionDriverName</name>
                       <value>com.mysql.jdbc.Driver</value>
                       <description>驱动名</description>
             </property>
             <property>
                      <name>javax.jdo.option.ConnectionUserName</name>
                       <value>root</value>
                       <description>用户名</description>
             </property>
             <property>
                       <name>javax.jdo.option.ConnectionPassword</name>
                       <value>weidong</value>
                       <description>密码</description>
             </property>
             <property>
                     <name>datanucleus.schema.autoCreateTables</name>
                       <value>true</value>
             </property>
             <property>
                       <name>hive.metastore.warehouse.dir</name>
                       <value>hdfs://master1:9000/home/hadoop/hive/warehouse</value>
                       <description>数据路径(相对hdfs)</description>
             </property>
             <property>
                       <name>hive.exec.scratchdir</name>
                       <value>hdfs://master1:9000/home/hadoop/hive/warehouse</value>
             </property>
             <property>
                       <name>hive.querylog.location</name>
                       <value>/home/hadoop/hive/logs</value>
             </property>
             <property>
                       <name>hive.aux.jars.path</name>
                       <value>file:///home/hadoop/hbase/lib</value>
             </property>
             <property>
                       <name>hive.metastore.uris</name>
                       <value>thrift://master1:9083</value>
                       <description>运行hive得主机地址及端口</description>
             </property>
    </configuration>

    将日志配置打开

    cp hive-log4j2.properties. template  hive-log4j2.properties
    cp hive-exec-log4j2.properties.template  hive-exec-log4j2.properties

    5.启动Hive

    1)首先需要先启动元数据库

    hive --service metastore &

    启动成功信息 

    2016-04-06T11:46:10,157 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:main(5876)) - Starting hive metastore on port 9083
    2016-04-06T11:46:10,210 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) - 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
    2016-04-06T11:46:10,299 INFO  [main]:
    metastore.ObjectStore (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
    2016-04-06T11:46:12,284 INFO  [main]:
    metastore.ObjectStore (ObjectStore.java:getPMF(402)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    2016-04-06T11:46:15,345 INFO  [main]:
    metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using direct SQL, underlying DB is MYSQL
    2016-04-06T11:46:15,352 INFO  [main]:
    metastore.ObjectStore (ObjectStore.java:setConf(301)) - Initialized ObjectStore
    2016-04-06T11:46:16,051 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(586)) - Added admin role in metastore
    2016-04-06T11:46:16,058 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(595)) - Added public role in metastore
    2016-04-06T11:46:16,239 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers_core(635)) - No user is added in admin role, since config is empty
    2016-04-06T11:46:16,715 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6020)) - Starting DB backed MetaStore Server with SetUGI enabled
    2016-04-06T11:46:16,729 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6077)) - Started the new metaserver on port [9083]...
    2016-04-06T11:46:16,729 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6079)) - Options.minWorkerThreads = 200
    2016-04-06T11:46:16,729 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6081)) - Options.maxWorkerThreads = 1000
    2016-04-06T11:46:16,730 INFO  [main]:
    metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6083)) - TCP keepalive = true

    2)启动hive客户端 

    输入hive命令即可

    [hadoop@master conf]$ hive
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in
    [jar:file:/home/hadoop/hive/lib/hive-jdbc-2.1.0-SNAPSHOT-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in
    [jar:file:/home/hadoop/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in
    [jar:file:/home/hadoop/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
    Logging initialized using configuration in file:/home/hadoop/hive/conf/hive-log4j2.properties
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive>

     检验hive客户端下能否查看/创建表

    6.停止HIVE

    ps –aux | grep hive 查看hive目前的进程PID,用kill杀掉即可。

    7.常用命令

    hive> show databases;
    OK
    default
    Time taken: 1.881 seconds, Fetched: 1 row(s)
    hive> use default;
    OK
    Time taken: 0.081 seconds
    hive> create table kylin_test(test_count int);
    OK
    Time taken: 2.9 seconds
    hive> show tables;
    OK
    kylin_test
    Time taken: 0.151 seconds, Fetched: 1 row(s)
    hive> select * from kylin_test;
    OK
    Time taken: 0.318 seconds

    在hive数据库里查询 

    九、kylin部署

    仅在master1上部署即可

    1.了解kylin的两种二进制包

    预打包的二进制安装包:apache-kylin-1.5.0-bin.tar.gz

    特别二进制包:apache-kylin-1.5.0-HBase1.1.3-bin.tar.gz


    说明:特别二进制包是一个在HBase 1.1+环境上编译的Kylin快照二进制包;安装它需要HBase 1.1.3或更高版本,否则之前版本中有一个已知的关于fuzzy key过滤器的缺陷,会导致Kylin查询结果缺少记录:HBASE-14269。此外还需注意的是,这不是一个正式的发布版(每隔几周rebase KYLIN 1.3.x 分支上最新的改动),没有经过完整的测试。

    2.解压缩apache-kylin-1.5.0-HBase1.1.3-bin.tar.gz到hadoop家目录下,进入目录kylin

    3.在/etc/profile里配置KYLIN环境变量和一个名为hive_dependency的变量

    export KYLIN_HOME=/home/hadoop/kylin
    export PATH=$PATH:$ KYLIN_HOME/bin
     
    export hive_dependency=/home/hadoop/hive/conf:/home/hadoop/hive/lib/*:/home/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.0.0.jar

    这个配置需要在从节点master2,slave1,slave2上同时配置,因为kylin提交的任务交给mr后,hadoop集群将任务分发给从节点时,需要hive的依赖信息,如果不配置,则mr任务将报错为: hcatalogXXX找不到。 

    4.修改kylin的启动脚本kylin.sh

    1)显式声明 KYLIN_HOME
    export KYLIN_HOME=/home/Hadoop/kylin
     
    2)在HBASE_CLASSPATH_PREFIX中显示增加$hive_dependency依赖 
    export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX

    5.检查环境是否设置成功

    [hadoop@master1 conf]$ check-env.sh
    KYLIN_HOME is set to /home/hadoop/kylin

    6.进入conf文件夹,修改kylin各配置文件如下:

    kylin.properties

    kylin.owner=wdcloud@kylin.apache.org
    kylin.rest.servers=master1:7070
    
    kylin.hdfs.working.dir=/home/hadoop/kylin/kylin_hdfs_working_dir
    kylin.job.remote.cli.working.dir=/home/hadoop/kylin/kylin_job_working_dir
    
     #定义kylin用于MR jobs的job.jar包和hbase的协处理jar包,用于提升性能。
    
    kylin.job.jar=/home/hadoop/kylin/lib/kylin-job-1.5.0-SNAPSHOT.jar
    kylin.coprocessor.local.jar=/home/hadoop/kylin/lib/kylin-coprocessor-1.5.0-SNAPSHOT.jar

     

    将kylin_hive_conf.xml和kylin_job_conf.xml的副本数设置为4

    <property>
      <name>dfs.replication</name>
      <value>4</value>
      <description>Block replication</description>
    </property>

    7.启动和停止kylin 

    #确认必须启动的服务:

    1)hadoop2的hdfs/yarn/jobhistory服务

        start-dfs.sh

        start-yarn.sh

        mr-jobhistory-daemon.sh start historyserver

    2)hive 元数据库:hive --service metastore &

    3)zookeeper

    4)hbase :start-hbase.sh

    #检查hive和hbase的依赖

    [hadoop@master1 kylin]$ find-hive-dependency.sh
    [hadoop@master1 kylin]$ find-hbase-dependency.sh

    #启动和停止kylin的命令如下: 

    [hadoop@master1 kylin]$ kylin.sh start
    [hadoop@master1 kylin]$ kylin.sh stop


    Web访问地址
     

    http://192.168.200.165:7070/kylin/login

    默认的登录username/password 是 ADMIN/KYLIN.

    十、kylin测试

    1.测试Kylin自带的sample

    Kylin提供一个自动化脚本来创建测试CUBE,这个脚本也会自动创建出相应的hive数据表。

    运行sample例子的步骤:

    ① 运行${KYLIN_HOME}/bin/sample.sh脚本

    [hadoop@master1 ~]$ sample.sh

    关键提示信息:

    KYLIN_HOME is set to /home/hadoop/kylin
    Going to create sample tables in hive...
    Sample hive tables are created successfully; Going to create sample cube...
    Sample cube is created successfully in project 'learn_kylin'; Restart Kylin server or reload the metadata from web UI to see the change.

    #在MYSQL中查看此sample创建了哪几张表

    # select DB_ID,OWNER,SD_ID,TBL_NAME from TBLS;

     #在hive客户端查看创建的表和数据量(1w条)

    hive> show tables;
    OK
    kylin_cal_dt
    kylin_category_groupings
    kylin_sales
    Time taken: 1.835 seconds, Fetched: 3 row(s)
    
    hive> select count(*) from kylin_sales;
    OK
    10000
    Time taken: 65.351 seconds, Fetched: 1 row(s)

    ② 重启kylin server 刷新缓存 

    [hadoop@master1 ~]$ kylin.sh stop
    [hadoop@master1 ~]$ kylin.sh start

    ③ 使用默认的用户名密码ADMIN/KYLIN访问192.168.200.165:7070/kylin 

    进入控制台后选择project为learn_kylin的那个项目。

     ④ 选择测试cube “kylin_sales_cube”,点击“Action”-“Build”,选择一个2014-01-01以后的日期,这是为了选择全部的10000条测试记录。

     选择一个生成日期

     点击提交会出现重建任务成功提交的提示

     ⑤ 在监控台查看这个任务的执行进度,直到这个任务100%完成。

     任务完成

     切换到model控制台会发现cube的状态成为了ready,表示可以执行sql查询了

     执行过程中,在hive里会生成临时表,待任务100%完成后,这张表会自动删除

    kylin_intermediate_kylin_sales_cube_desc_20120201000000_20120201000000

     执行过程中,在hbase里会生成永久的计算结果表,如:

    KYLIN_PTQIXMC64A

     如果build了两个以上的segment。还可以执行merge操作:

     完成Merge任务

     这时候HBASE里面不同的Segment表示的多张表也同时合并成了一张表,以节省磁盘空间

     Build过程中出现的问题:

     当任务执行到第五步:创建HTable的时候,报错说创建的表不可用。

    最终导致整个任务的失败 ERROR

    2016-04-07 12:40:57,823 ERROR [pool-7-thread-5] steps.CubeHTableUtil:135 : Failed to create HTable

    java.lang.IllegalArgumentException: table KYLIN_9USQAHQQXC created, but is not available due to some reasons

             at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)

             at org.apache.kylin.storage.hbase.steps.CubeHTableUtil.createHTable(CubeHTableUtil.java:132)

             at org.apache.kylin.storage.hbase.steps.CreateHTableJob.run(CreateHTableJob.java:104)

             at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

             at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

             at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)

             at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)

             at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)

             at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)

             at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)

             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

             at java.lang.Thread.run(Thread.java:745)

    在社区提问:

    http://apache-kylin.74782.x6.nabble.com/an-error-occurred-when-build-a-sample-cube-at-step-5-create-HTable-td4102.html 

    出错的原因:

    因为在kylin中默认使用了snappy压缩算法导致的。

    HDFS报错日志:

    2016-04-12 12:05:05,726 ERROR [RS_OPEN_REGION-slave2:16020-0]

    handler.OpenRegionHandler: Failed open of

    region=KYLIN_VKRC32OKFP,,1460433926913.73fb906719a75b2733f046e87fbe8105., starting to roll back the global memstore size.

    org.apache.hadoop.hbase.DoNotRetryIOException: Compression algorithm 'snappy' previously failed test.

      at org.apache.hadoop.hbase.util.CompressionTest.testCompression

    (CompressionTest.java:91)

             at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs

    (HRegion.java:6300)

             at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6251)

             at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6218)

             at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6189)

             at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6145)

             at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6096)

             at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion

    (OpenRegionHandler.java:362)

             at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process

    (OpenRegionHandler.java:129)

             at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)

             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

             at java.lang.Thread.run(Thread.java:745)

    2016-04-12 12:05:05,727 INFO  [RS_OPEN_REGION-slave2:16020-0]

    coordination.ZkOpenRegionCoordination: Opening of region {ENCODED =>

    73fb906719a75b2733f046e87fbe8105, NAME =>

    'KYLIN_VKRC32OKFP,,1460433926913.73fb906719a75b2733f046e87fbe8105.', STARTKEY => '', ENDKEY => 'x00x01'} failed, transitioning from OPENING to FAILED_OPEN in ZK, expecting version 1

    2016-04-12 12:05:05,775 INFO  [PriorityRpcServer.handler=18,queue=0,port=16020] regionserver.RSRpcServices: Open

    KYLIN_VKRC32OKFP,x00x01,1460433926913.06978b9fb1e423563a5aae7e1df044d8.

    解决方法:禁用压缩或者使用LZO作为压缩算法

    官网给出的禁用压缩算法的方法如下:

    To disable compressing MR jobs you need to modify

    $KYLIN_HOME/conf/kylin_job_conf.xml by removing all configuration entries related to compression(Just grep the keyword “compress”). To disable compressing hbase tables you need to open $KYLIN_HOME/conf/kylin.properties and remove the line starting with kylin.hbase.default.compression.codec.

    ⑥ 切换到Insight 窗口执行SQL语句,例如:

    select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt;

    在Kylin中执行如上的sql统计只用了0.46s (十次取平均值)

    在Hive里执行同一条sql统计语句,花费时间高达136秒

    hive> select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt;

    Time taken: 136.489 seconds, Fetched: 731 row(s)

    可见:kylin执行这条sql明显提升了效率。

    其他测试语句:

    ①       select * from kylin_sales;(1s内)

    ②    各个时间段内的销售额及购买量(0.39秒)

    select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers

    from kylin_sales

    group by part_dt

    order by part_dt;

    ③    查询某一时间的销售额及购买量(0.40秒)

    select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales

    where part_dt = '2014-01-01'

    group by part_dt;

    发现报错:

    Error while compiling generated Java code:

    public static class Record3_0 implements java.io.Serializable {           

    public java.math.BigDecimal f0;

        public boolean f1;

    public org.apache.kylin.common.hll.HyperLogLogPlusCounter f2;         

    public Record3_0(java.math.BigDecimal f0, boolean f1, ...

    这是因为part_dt是date类型,在解析string到date的时候出问题,应将sql语句改为:

    select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers

    from kylin_sales

    where part_dt between '2014-01-01' and '2014-01-01'

    group by part_dt;

    或者

    select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers

    from kylin_sales

    where part_dt = date '2014-01-01'

    group by part_dt;

    ④上面查询只用到了fact table,而没有用到lookup table。如果查询各个时间段所有二级商品类型的销售额,则需要fact table与lookup table做inner join(1.36s)

    select fact.part_dt, lookup.CATEG_LVL2_NAME, count(distinct seller_id) as sellers

    from kylin_sales fact

    inner join KYLIN_CATEGORY_GROUPINGS lookup

    on fact.LEAF_CATEG_ID = lookup.LEAF_CATEG_ID and fact.LSTG_SITE_ID = lookup.SITE_ID

    group by fact.part_dt, lookup.CATEG_LVL2_NAME

    order by fact.part_dt desc

  • 相关阅读:
    马云:员工的离职原因--转载
    zookeeper源码分析之五服务端(集群leader)处理请求流程
    技术高手如何炼成?--转自知乎
    一次上线事故经验
    zookeeper源码分析之四服务端(单机)处理请求流程
    AngularJS2.0 quick start——其和typescript结合需要额外依赖
    typescript 入门例子 Hello world——ts就是一个宿主机语言
    Kubernetes——自动扩展容器!假设你突然需要增加你的应用;你只需要告诉deployment一个新的 pod 副本总数即可
    Kubernetes——基于容器技术的分布式架构领先方案,它的目标是管理跨多个主机的容器,提供基本的部署,维护以及运用伸缩
    华为FusionSphere概述——计算资源、存储资源、网络资源的虚拟化,同时对这些虚拟资源进行集中调度和管理
  • 原文地址:https://www.cnblogs.com/avivaye/p/5391951.html
Copyright © 2011-2022 走看看