zoukankan      html  css  js  c++  java
  • scala+spark+Hbase

    • spark特点与应用场景

    •   Spark是通用的并行化计算框架,基于MapReduce实现分布式计算,其中间结果可以保存在内存中,从而不再需要读写HDFS。
    • 特点:
    • 简单方便,使用scala语言。(与RDD很好结合)
    • 计算速度快,中间结果缓存在内存中。
    • 高错误容忍。
    • 操作丰富。
    • 广播,每个节点可以保留一份小数据集。
    • 核心:RDD(Resilient Distributed Datasets弹性分布式数据集)
    • 应用场景:
    • 迭代式算法:迭代式机器学习、图算法,包括PageRank、K-means聚类和逻辑回归(logistic regression)。
    • 交互式数据挖掘工具:用户在同一数据子集上运行多个Adhoc查询。

    • 在上篇博文中搭建了zookpree+hadoop集群,接下来准备搭建scala+spark+Hbase完善下集群。
    • rz -E  上传scala+spark+hbase包
    • tar -zxvf scala-2.11.8.tgz
    • tar -zxf spark-2.0.1-bin-hadoop2.7.tgz
    • Spark的安装教程

    • 安装JDK与Scala

    • 下载JDK:sudo apt-get install openjdk-7-jre-headless。
    • 下载Scala: http://www.scala-lang.org/
    • 解压缩:tar -zxvf scala-2.11.8.tgz。
    • 进入sudo vim /etc/profile在下面添加路径:(vi .bashrc)
    • export SCALA_HOME=/data/app/scala-2.11.8
    • export SPARK_HOME=/data/app/spark-2.0.1-bin-hadoop2.7
    • export HBASE_HOME=/data/app/hbase-1.2.3
    • export PATH=:$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$ZOOKEEPER/bin:$HADOOP/bin:$HADOOP/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$HBASE_HOME/bin
    • 使修改生效source /etc/profile。 source /etc/profile
    • 在命令行输入scala测试。
    • scala
    • Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101).
    • Type in expressions for evaluation. Or try :help.
    • scala>
    • 出现这个证明scala已经安装成功。

     

    • 安装Spark

    • 下载Spark: http://spark.apache.org/downloads.html
    • 解压缩:spark-2.0.1-bin-hadoop2.7.tgz
    • 进入sudo vim /etc/profile在下面添加路径:
    • (上面已经添加过了就不用添加了)
    • vi /home/soft/app/spark-2.0.1-bin-hadoop2.7/conf/spark-env.sh
    • export JAVA_HOME=/home/soft/app/jdk1.8.0_101
      export SCALA_HOME=/home/soft/app/scala-2.11.8
      export HADOOP_HOME=/home/soft/app/hadoop-2.7.3
      export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
      export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
      export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node1:2181,node2:2181,node3:2181,node4:2181,node5:2181 -Dspark.deploy
      .zookeeper.dir=/spark"
      export SPARK_EXECUTOR_MEMORY=5g
      export SPARK_WORKER_MEMORY=7g
      export SPARK_LOG_DIR=/data/logs/spark_logs/

    • mkdir -pv /data/logs/spark_logs/

    vi /home/soft/app/spark-2.0.1-bin-hadoop2.7/conf/slaves

    node1
    node2
    node3
    node4

    media@node1:~$ spark-shell
    /data/app/spark-2.0.1-bin-hadoop2.7/conf/spark-env.sh: line 72: unexpected EOF while looking for matching `"'
    /data/app/spark-2.0.1-bin-hadoop2.7/conf/spark-env.sh: line 76: syntax error: unexpected end of file
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel).
    17/07/11 11:41:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/07/11 11:41:47 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
    Spark context Web UI available at http://10.31.81.41:4040
    Spark context available as 'sc' (master = local[*], app id = local-1499744506831).
    Spark session available as 'spark'.
    Welcome to
    ____ __
    / __/__ ___ _____/ /__
    _ / _ / _ `/ __/ '_/
    /___/ .__/\_,_/_/ /_/\_ version 2.0.1
    /_/

    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
    Type in expressions to have them evaluated.
    Type :help for more information.

    scala>

    出现这个界面spark安装成功!!!

    安装hbase

    cd /home/soft/app/hbase-1.2.3/conf

    cat regionservers
    node1
    node2
    node3
    node4

    cat backup-masters
    node2

    cat hbase-site.xml 

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
    /**
     *
     * Licensed to the Apache Software Foundation (ASF) under one
     * or more contributor license agreements.  See the NOTICE file
     * distributed with this work for additional information
     * regarding copyright ownership.  The ASF licenses this file
     * to you under the Apache License, Version 2.0 (the
     * "License"); you may not use this file except in compliance
     * with the License.  You may obtain a copy of the License at
     *
     *     http://www.apache.org/licenses/LICENSE-2.0
     *
     * Unless required by applicable law or agreed to in writing, software
     * distributed under the License is distributed on an "AS IS" BASIS,
     * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     * See the License for the specific language governing permissions and
     * limitations under the License.
     */
    -->
    <configuration>
    <property>
       <name>dfs.ha.namenodes.ns</name>
       <value>node1,node2</value>
    </property>
    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://ns/hbase</value>
    </property>
    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>
    <property>
    <name>hbase.master</name>
    <value>16000</value>
    </property>
    <property>
    <name>hbase.master.port</name>
    <value>16000</value>
    </property>
    <property>
    <name>hbase.master.info.port</name>
    <value>16010</value>
    </property>
    <property>
    <name>hbase.regionserver.port</name>
    <value>16020</value>
    </property>
    <property>
    <name>hbase.regionserver.info.port</name>
    <value>16030</value>
    </property>
           <property>
              <name>hbase.zookeeper.property.clientPort</name>
              <value>2181</value>
              <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.
              </description>
            </property>
            <property>
              <name>hbase.zookeeper.quorum</name>
              <value>node1,node2,node3,node4,node5</value>
            </property>
            <property>
              <name>hbase.zookeeper.property.dataDir</name>
              <value>/data/zk_data</value>
            </property>
            <property>
              <name>zookeeper.session.timeout</name>
              <value>180000</value>
            </property>
            <property>
              <name>hbase.zookeeper.property.tickTime</name>
              <value>9000</value>
            </property>
    <property>
        <name>hbase.tmp.dir</name>
        <value>/data/hbase/tmp</value>
    </property>
    </configuration>
    View Code

    mkdir -pv /data/hbase/tm

    vi hbase-env.sh

    27 export JAVA_HOME=/home/media/app/jdk1.8
    30 export HBASE_CLASSPATH=/home/media/app/hadoop-2.7.3/etc/hadoop

  • 相关阅读:
    微信公众平台开发介绍(一)
    C#使用iTextSharp操作PDF文件
    使用NPOI读取Excel文件
    jquery写的树状列表插件-alvintree
    分享一个图片上传插件(TP5.0)
    TP5.0实现无限极回复功能
    php静态缓存简单制作
    LinQ to SQL用法详解
    php简单实现socket通信
    简单分析JavaScript中的面向对象
  • 原文地址:https://www.cnblogs.com/xiaoyongzhuo/p/7128332.html
Copyright © 2011-2022 走看看