zoukankan      html  css  js  c++  java
  • tachyon of zybo cluster

    把Tachyon层加入spark和hadoop之间,以加速集群

    官网:http://tachyon-project.org/

    github:https://github.com/amplab/tachyon/releases

    (1)准备工作:

    wget http://tachyon-project.org/downloads/tachyon-0.4.1-bin.tar.gz
    tar xvfz tachyon-0.4.1-bin.tar.gz
    cd tachyon-0.4.1

    cp conf/tachyon-env.sh.template conf/tachyon-env.sh

    (2)在本地测试:

    vi conf/tachyon-env.sh

    image

    ./bin/tachyon format
    ./bin/tachyon-start.sh local
    ./bin/tachyon runTest Basic CACHE_THROUGH

    image

    image

    image

    (3)与Hadoop结合:Set HDFS as Tachyon’s under filesystem

    因为2.4.0的hadoop需要重新编译,在arm平台安装maven会出错,故转移到x64pc机编译:

    apt-get install maven

    vi pom.xml

    {`URP5$~}$M056}P20LTB`5

    mvn -Dhadoop.version=2.4.0 clean package

    image

    cp -r /root/tachyon-0.4.1 /media/fs/root/

    cd /root/tachyon-0.4.1

    image

    cd ..

    cd hadoop-2.4.0/

    vi etc/hadoop/core-site.xml

    image

    <property>
      <name>fs.tachyon.impl</name>
      <value>tachyon.hadoop.TFS</value>
    </property>

    vi etc/hadoop/hadoop-env.sh

    加入一行:

    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/root/tachyon-0.4.1/target/tachyon-0.4
    .1-jar-with-dependencies.jar

    cd /root

    ./gohadoop.sh

    cd tachyon-0.4.1

    ./bin/tachyon format

    image

    ./bin/tachyon-start.sh local
    ./bin/tachyon runTest Basic CACHE_THROUGH

    image

    cd $HADOOP_HOME
    执行如下命令:
    ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar 
    wordcount -libjars /root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar 
    tachyon://192.168.1.1:19998/in/file /out/file
    image

    (4)与Spark结合:Running Spark on Tachyon

    cd spark-0.9.1-bin-hadoop2

    vi conf/spark-env.sh

    image

    SPARK_CLASSPATH=/root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:$SPARK_CLASSPATH
    export SPARK_CLASSPATH

    export TACHYON_MASTER="192.168.1.1:19998"

    新建一个配置文件:

    vi conf/core-site.xml

    image

    <configuration>
      <property>
        <name>fs.tachyon.impl</name>
        <value>tachyon.hadoop.TFS</value>
      </property>
    </configuration>

    运行

    MASTER=spark://192.168.1.1:7077 ./bin/pyspark
    file = sc.textFile("tachyon://192.168.1.1:19998/in/file")
    counts = file.flatMap(lambda line: line.split(" "))
                 .map(lambda word: (word, 1))
                 .reduceByKey(lambda a, b: a + b)
    counts.collect()

    counts.saveAsTextFile("tachyon://192.168.1.1:19998/out/mycount")

    counts.saveAsTextFile("hdfs://192.168.1.1:9000/out/mycount1")

    collect()正确执行,

    save to hadoop 正确执行,

    save to tachyon 后出错:

    image

    参考网站:http://tachyon-project.org/Syncing-the-Underlying-Filesystem.html

    暂未解决。

    先只测试用Tachyon读数据1G大小的文本文件:

    使用hadoop读取使用了16分钟。

    scp tachyon-0.4.1.bak2.tar.gz root@spark4:/root/

  • 相关阅读:
    第十二章学习笔记
    UVa OJ 107 The Cat in the Hat (戴帽子的猫)
    UVa OJ 123 Searching Quickly (快速查找)
    UVa OJ 119 Greedy Gift Givers (贪婪的送礼者)
    UVa OJ 113 Power of Cryptography (密文的乘方)
    UVa OJ 112 Tree Summing (树的求和)
    UVa OJ 641 Do the Untwist (解密工作)
    UVa OJ 105 The Skyline Problem (地平线问题)
    UVa OJ 100 The 3n + 1 problem (3n + 1问题)
    UVa OJ 121 Pipe Fitters (装管子)
  • 原文地址:https://www.cnblogs.com/shenerguang/p/3836313.html
Copyright © 2011-2022 走看看