zoukankan      html  css  js  c++  java
  • hadoop+tachyon+spark的zybo cluster集群综合配置

    1.zybo cluster 架构简述:

    1.1 zybo cluster 包含5块zybo 开发板组成一个集群,zybo的boot文件为digilent zybo reference design提供的启动文件,文件系统采用arm ubuntu。ip地址自上而下为192.168.1.1~5,hostname自上而下为spark1~5,另外由于sdka写入速度为2.3Mps,因而每个zybo卡另外配置一个Sandisk Cruzer Blade 32GB 作为usb拓展存储设备,写速度为4Mps,运行的程序和jdk都放在u盘中,且每个节点都在u盘中设立一个2GB大小的swap文件作为交换区空间,以加速集群。所有节点都连接到一个千兆交换机上。性质为纯ARM计算集群。

    1.2 就hadoop来说,版本为2.4.0,官方可执行包。spark1运行namenode,也同时作为datanode,而spark2~4只作为datanode,由于U盘容量有限,5个节点capacity约150G。hadoop配置了对tachyon的支持(增加dependence的jre库)。

    image

    1.3 就tachyon来说,spark1节点作为master,其他4个节点作为slave,每个slave节点的ramdisk缓冲区为1G,underlayer filesystem为hadoop。tachyon的主要作用是作为数据缓冲层,减少直接读取hadoop的网络开销,从而提高大数据计算的速度。

    image

    1.4 就spark来说,spark1节点只作为spark master,而其余4个节点则作为counter slaves。spark配置了对tachyon的支持(增加dependence的jre库),增加了对tachyon master节点ip的环境变量。

    image

    2.打开集群测试操作

    2.1将5个zybo都上电,它们将自动配置mac地址及ip地址,用串口登录spark1节点,cd到root目录,执行如下命令:

    ./gohadoop.sh

    ./gotachyon.sh

    ./gospark.sh

    2.2如要运行多个tachyon,并避免多个tachyon master之间的竞争,配置容错集群,运行以下命令开启zookeeper。

    ./gozookeeper.sh

    2.3当提示启动成功,前往spark 的安装目录运行以下命令开启python命令行。

    MASTER=spark://192.168.1.1:7077 ./bin/pyspark

    2.4若开启期间出错,可使用以下命令停止所有spark节点并重新启动:

    ./sbin/stop-all.sh

    SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh

    测试脚本见5.spark测试一节。

    3.Hadoop 测试:

    3.1开启hadoop demo命令:

    cd /mnt/hadoop-2.4.0/
    sbin/hadoop-daemon.sh start namenode
    sbin/hadoop-daemon.sh start datanode
    sbin/hadoop-daemon.sh start secondarynamenode
    sbin/yarn-daemon.sh start resourcemanager
    sbin/yarn-daemon.sh start nodemanager
    sbin/mr-jobhistory-daemon.sh start historyserver

    3.2正常启动hadoop所有节点命令:

    cd /mnt/hadoop-2.4.0/
    sbin/start-dfs.sh
    sbin/start-yarn.sh

    3.3用jps观察目前运行的java进程:

    jps -l | sort -k 2

    3.4用netstat命令来监测端口开启情况,hadoop namenode默认端口9000,可以观察网页http://192.168.1.1:9000

    while [ `netstat -ntlp | grep 9000` -eq `echo` ]
    do
    sleep 1
    done
    netstat -ntlp | grep 9000

    3.5等待datanode都启动完毕后,查看hdfs中的目录:

    bin/hadoop dfs -ls /

    3.6从本地拷贝/mnt/in文件夹进入hdfs中成为/in的命令:

    bin/hadoop dfs -copyFromLocal /mnt/in /in

    3.7如何利用hadoop计算wordcount(hadoop目录 /in下拥有需要计算的文件,/out为输出目录:

    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount /in /out

    3.8读取hadoop中计算好wordcount生成的文件

    bin/hadoop fs -ls /out

    bin/hadoop dfs -cat /out/part-r-00000

    3.9如果要向hadoop集群加入新节点,需要格式化hadoop的namenode,同时所有的datanode也将被格式化:

    rm -rf /mnt/namenode
    rm -rf /mnt/datanode
    rm -rf /mnt/hadoop/tmp/*

    并且应该在所有的spark2~4执行以下命令,以去除存储的namenodeID和datanodeID。

    rm -rf /mnt/datanode

    然后执行format命令格式化namenode:

    cd /mnt/hadoop-2.4.0/
    bin/hadoop namenode -format
    bin/hadoop datanode -format

    最后,启动hadoop。

    3.10停止hadoop(demo,单机):

    sbin/hadoop-daemon.sh stop namenode

    sbin/hadoop-daemon.sh stop datanode

    sbin/hadoop-daemon.sh stop secondarynamenode

    sbin/yarn-daemon.sh stop resourcemanager

    sbin/yarn-daemon.sh stop nodemanager

    sbin/mr-jobhistory-daemon.sh stop historyserver

    3.11停止hadoop(集群):

    。。。

    4.Tachyon测试:

    4.1首先,格式化tachyon缓冲层,然后启动所有节点,并把ramdisk进行mount操作:

    cd /mnt/tachyon-0.4.1

    ./bin/tachyon format
    ./bin/tachyon-stop.sh
    ./bin/tachyon-start.sh all Mount

    4.2等待tachyon的master和slave就绪,也可以访问http://192.168.1.1:19999确定所有节点启动。

    while [ `netstat -ntlp | grep 19998` -eq `echo` ]
    do
    sleep 1
    done

    jps -l | sort -k 2

    4.3 加载under file system到tachyon,让tachyon明白hadoop中已有的目录和其中的所有文件信息,如果不执行此命令,可能会遇到Unknown under file system scheme 错误java.lang.IllegalArgumentException.

    ./bin/tachyon loadufs tachyon://192.168.1.1:19998 hdfs://192.168.1.1:9000 /

    4.4测试tachyon:

    简单测试:

    ./bin/tachyon runTest Basic CACHE_THROUGH

    全面测试:

    ./bin/tachyon runTests

    4.5用tachyon层+hadoop测试wordcount,在hadoop安装目录运行以下行:

    ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar 
    wordcount -libjars /root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar 
    tachyon://192.168.1.1:19998/in/file /out/file

    4.6 关闭tachyon

    ./bin/tachyon-stop.sh

    5.Spark测试:

    5.1启动spark集群

    cd /mnt/spark-0.9.1-bin-hadoop2

    SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh

    5.2检查启动状况,使用如下命令,或查看网页http://192.168.1.1:8080

    jps -l | sort -k 2

    echo "please wait..."
    while [ `netstat -ntlp | grep 7077` -eq `echo` ]
    do
    sleep 1
    done
    netstat -ntlp | grep 7077

    5.3开启python spark 命令行:

    cd /mnt/spark-0.9.1-bin-hadoop2

    MASTER=spark://192.168.1.1:7077 ./bin/pyspark

    5.4pi测试脚本(1000为number of samples采样个数):

    from random import random
    def sample(p):
        x, y = random(), random()
        return 1 if x*x + y*y < 1 else 0
    
    count = sc.parallelize(xrange(0, 1000)).map(sample) 
                 .reduce(lambda a, b: a + b)
    print "Pi is roughly %f" % (4.0 * count / 1000)

    5.5 wordcount的hadoop版:

    SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh
    MASTER=spark://192.168.1.1:7077 ./bin/pyspark
    file = sc.textFile("hdfs://192.168.1.1:9000/test/file1M")
    file = sc.textFile("hdfs://192.168.1.1:9000/test/file10M")
    file = sc.textFile("hdfs://192.168.1.1:9000/test/file100M")
    file = sc.textFile("hdfs://192.168.1.1:9000/test/file1G")
    file = sc.textFile("hdfs://192.168.1.1:9000/test/file10G")
    file = sc.textFile("hdfs://192.168.1.1:9000/test/file100G")
    counts = file.flatMap(lambda line: line.split(" ")) 
                 .map(lambda word: (word, 1)) 
                 .reduceByKey(lambda a, b: a + b)
    counts.collect()
    counts.saveAsTextFile("hdfs://192.168.1.1:9000/out/outfile") 

    5.6workcount的tachyon版:

    SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh
    MASTER=spark://192.168.1.1:7077 ./bin/pyspark
    file = sc.textFile("tachyon://192.168.1.1:19998/test/file1M")
    file = sc.textFile("tachyon://192.168.1.1:19998/test/file10M")
    file = sc.textFile("tachyon://192.168.1.1:19998/test/file100M")
    file = sc.textFile("tachyon://192.168.1.1:19998/test/file1G")
    file = sc.textFile("tachyon://192.168.1.1:19998/test/file10G")
    file = sc.textFile("tachyon://192.168.1.1:19998/test/file100G")
    counts = file.flatMap(lambda line: line.split(" ")) 
                 .map(lambda word: (word, 1)) 
                 .reduceByKey(lambda a, b: a + b)
    counts.collect()
    counts.saveAsTextFile("tachyon://192.168.1.1:19998/out/outfile") 

    其他测试请参考reference.5

    5.7关闭spark:

    ./sbin/stop-all.sh

    6.节点通用配置

    6.1配置计算机名(如第二个节点就配置成spark2)。

    vi /etc/hostname

    spark2

    6.2配置本地网络节点信息配置文件(以spark1为例)。

    vi /etc/hosts

    #127.0.0.1      localhost       zynq
    192.168.1.1     spark1          localhost
    192.168.1.2     spark2
    192.168.1.3     spark3
    192.168.1.4     spark4
    192.168.1.5     spark5
    #::1            localhost ip6-localhost ip6-loopback

    6.3配置ipv6为disable(reboot生效):

    vi /etc/sysctl.conf

    net.ipv6.conf.all.disable_ipv6 = 1

    net.ipv6.conf.default.disable_ipv6 = 1

    net.ipv6.conf.lo.disable_ipv6 = 1

    6.4路径环境变量及启动加载配置(以spark1节点为例):

    vi /etc/profile

    export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH
    export JAVA_HOME=/mnt/jdk1.7.0_55
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:$JAVA_HOME/lib/tools.jar
    export PATH=$JAVA_HOME/bin:$PATH
    export HADOOP_HOME=/mnt/hadoop-2.4.0

    export PATH=$PATH:$HADOOP_HOME/bin
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

    ifconfig eth2 hw ether 00:0a:35:00:01:01
    ifconfig eth2 192.168.1.1/24 up

    6.5ssh配置:

    生成公匙 id_rsa.pub 配置文件(一路回车):

    ssh-keygen -t rsa

    把localhost加入签名:

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    分发公钥:

    ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark1

    ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark2

    ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark3

    ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark4

    ssh-copy-id -i ~/.ssh/id_rsa.pub root@spark5

    6.6配置java

    cd /usr/bin/

    ln -s /usr/lib/jdk1.7.0_55/bin/java java

    ln -s /usr/lib/jdk1.7.0_55/bin/javac javac

    ln -s /usr/lib/jdk1.7.0_55/bin/jar jar

    6.7配置swap

    打印当前内存空间情况:

    free -m

    创建一个swap文件:

    cd /mnt
    mkdir swap
    cd swap/

    dd if=/dev/zero of=swapfile bs=1024 count=1000000

    把生成的文件转换成swap文件 :

    mkswap swapfile

    激活swap文件 :

    swapon swapfile
    free -m

    7.Hadoop配置

    cd /mnt/hadoop-2.4.0

    7.1配置hadoop运行环境:

    vi etc/hadoop/hadoop-env.sh

    export JAVA_HOME=/mnt/jdk1.7.0_55
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/mnt/tachyon-0.4.1/target/tachyon-0.1-jar-with-dependencies.jar

    7.2配置yarn-site

    vi etc/hadoop/yarn-site.xml

    <configuration>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>

      <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>

    </configuration>

    7.3配置core-site

    首先建立:/mnt/hadoop/tmp目录

    vi etc/hadoop/core-site.xml

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://192.168.1.1:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/mnt/hadoop/tmp</value>
        </property>
        <property>
            <name>fs.tachyon.impl</name>
            <value>tachyon.hadoop.TFS</value>
        </property>
    </configuration>

    7.4配置hdfs-site

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.permissions</name>
            <value>false</value>
        </property>

        <property>
            <name>dfs.namenode.rpc-address</name>
            <value>192.168.1.1:9000</value>
        </property>

        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/mnt/datanode</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/mnt/namenode</value>
        </property>
    </configuration>

    7.5配置mapred-site

    vi etc/hadoop/mapred-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>

    7.6master配置为192.168.1.1,slave配置成5个节点的ip地址即可

    8.Tachyon配置

    cd /mnt/tachyon-0.4.1

    8.1配置tachyon环境:

    vi conf/tachyon-env.sh

    if [[ `uname -a` == Darwin* ]]; then
      # Assuming Mac OS X
      export JAVA_HOME=$(/usr/libexec/java_home)
      export TACHYON_RAM_FOLDER=/Volumes/ramdisk
      export TACHYON_JAVA_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.k="
    else
      # Assuming Linux
      if [ -z "$JAVA_HOME" ]; then
        export JAVA_HOME=/mnt/jdk1.7.0_55
      fi
      export TACHYON_RAM_FOLDER=/mnt/ramdisk
    fi

    export JAVA="$JAVA_HOME/bin/java"
    export TACHYON_MASTER_ADDRESS=192.168.1.1
    #export TACHYON_UNDERFS_ADDRESS=/mnt/underfs
    #export TACHYON_UNDERFS_ADDRESS=/mnt/underfs
    export TACHYON_UNDERFS_ADDRESS=hdfs://192.168.1.1:9000
    export TACHYON_WORKER_MEMORY_SIZE=1GB
    export TACHYON_UNDERFS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem

    CONF_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

    export TACHYON_JAVA_OPTS+="
      -Dlog4j.configuration=file:$CONF_DIR/log4j.properties
      -Dtachyon.debug=false
      -Dtachyon.underfs.address=$TACHYON_UNDERFS_ADDRESS
      -Dtachyon.underfs.hdfs.impl=$TACHYON_UNDERFS_HDFS_IMPL
      -Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data
      -Dtachyon.workers.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/workers
      -Dtachyon.worker.memory.size=$TACHYON_WORKER_MEMORY_SIZE
      -Dtachyon.worker.data.folder=$TACHYON_RAM_FOLDER/tachyonworker/
      -Dtachyon.master.worker.timeout.ms=60000
      -Dtachyon.master.hostname=$TACHYON_MASTER_ADDRESS
      -Dtachyon.master.journal.folder=/mnt/journal/
      -Dtachyon.master.pinlist=/pinfiles;/pindata
      -Dorg.apache.jasper.compiler.disablejsr199=true
    "

    8.2若使用zookeeper,的配置如下:

    if [[ `uname -a` == Darwin* ]]; then
      # Assuming Mac OS X
      export JAVA_HOME=$(/usr/libexec/java_home)
      export TACHYON_RAM_FOLDER=/Volumes/ramdisk
      export TACHYON_JAVA_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.k="
    else
      # Assuming Linux
      if [ -z "$JAVA_HOME" ]; then
        export JAVA_HOME=/usr/lib/jdk1.7.0_55
      fi
      export TACHYON_RAM_FOLDER=/mnt/ramdisk
    fi

    export JAVA="$JAVA_HOME/bin/java"
    export TACHYON_MASTER_ADDRESS=192.168.1.1
    #export TACHYON_UNDERFS_ADDRESS=$TACHYON_HOME/underfs
    #export TACHYON_UNDERFS_ADDRESS=/mnt/underfs
    export TACHYON_UNDERFS_ADDRESS=hdfs://192.168.1.1:9000
    export TACHYON_WORKER_MEMORY_SIZE=1GB
    export TACHYON_UNDERFS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem
    #export TACHYON_UNDERFS_HDFS_IMPL=fs.defaultFS
    export TACHYON_ZOOKEEPER_ADDRESS=192.168.1.1:2181

    CONF_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

    export TACHYON_JAVA_OPTS+="
      -Dlog4j.configuration=file:$CONF_DIR/log4j.properties
      -Dtachyon.debug=false
      -Dtachyon.underfs.address=$TACHYON_UNDERFS_ADDRESS
      -Dtachyon.usezookeeper=true
      -Dtachyon.zookeeper.address=$TACHYON_ZOOKEEPER_ADDRESS
      -Dtachyon.underfs.hdfs.impl=$TACHYON_UNDERFS_HDFS_IMPL
      -Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data
      -Dtachyon.workers.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/workers
      -Dtachyon.worker.memory.size=$TACHYON_WORKER_MEMORY_SIZE
      -Dtachyon.worker.data.folder=$TACHYON_RAM_FOLDER/tachyonworker/
      -Dtachyon.master.worker.timeout.ms=60000
      -Dtachyon.master.hostname=$TACHYON_MASTER_ADDRESS
      -Dtachyon.master.journal.folder=hdfs://192.168.1.1:9000/tachyon/journal/
      -Dtachyon.master.pinlist=/pinfiles;/pindata
      -Dorg.apache.jasper.compiler.disablejsr199=true
    "

    8.3.配置slaves为192.168.1.2~5


    9. Spark配置

    cd /mnt/spark-0.9.1-bin-hadoop2/

    9.1配置core-site

    vi conf/core-site.xml

    <configuration>
      <property>
        <name>fs.tachyon.impl</name>
        <value>tachyon.hadoop.TFS</value>
      </property>
    </configuration>

    9.2配置core-site

    vi conf/spark-env.sh

    JAVA_HOME=/mnt/jdk1.7.0_55
    SPARK_MASTER_IP=192.168.1.1
    SPARK_CLASSPATH=/mnt/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.r:$SPARK_CLASSPATH
    export SPARK_CLASSPATH

    9.3配置slaves为192.168.1.2~5


    10.配置zookeeper:

    cd /mnt/zookeeper-3.3.6

    vi conf/zoo.cfg

    # The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial
    # synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between
    # sending a request and getting an acknowledgement
    syncLimit=5
    # the directory where the snapshot is stored.
    dataDir=/mnt/zookeeper
    # the port at which the clients will connect
    clientPort=2181
    #server.1=192.168.1.1:2888:3888
    #server.2=192.168.1.2:2888:3888

    11.问题

    (hadoop节点识别机制):当需要向集群中加入一个新的datanode节点时,我们会复制当前的一个节点的sd卡到新的节点中,这会造成hadoop的datanode监控页面中这个被复制节点和新节点竞争的局面,因为hadoop不是根据ip,mac或者机器名来识别一个节点。相反,namespaceID是hadoop集群的唯一标识符,namenode通过此ID来识别自己集群中的datanode。

    参考:http://blog.csdn.net/xiaojiafei/article/details/10152395

    解决:清空新节点的etc/hadoop/hdfs-site.xml中定义的namenode文件夹。

    12. reference:

    1.Digilent zybo Ref Design

    http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,1198&Prod=ZYBO

    2.Oracle JDK7 for ARM

    http://www.oracle.com/technetwork/java/javase/downloads/jdk7-arm-downloads-2187468.html

    3.What is hadoop:

    http://hadoop.apache.org/

    4.What is spark:

    http://spark.apache.org/

    5.Spark example code:

    http://spark.apache.org/examples.html

    6.What is hdfs:

    http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

    7.What is tachyon:

    http://tachyon-project.org/

    8.Tachyon github:

    https://github.com/amplab/tachyon/releases

    9.What is Zoo Keeper:

    http://zookeeper.apache.org/

  • 相关阅读:
    GDOI 2019 退役记
    SHOI2019 游记
    【WC2014】紫荆花之恋
    PKUWC 2019 & NOIWC2019 比赛总结 Formal Version
    WC 2019 颓废记
    VDUVyRLYJC
    Git学习
    DOM学习笔记
    python基础---->AJAX的学习
    python基础---->进程、线程及相关等
  • 原文地址:https://www.cnblogs.com/shenerguang/p/3855121.html
Copyright © 2011-2022 走看看