zoukankan      html  css  js  c++  java
  • tachyon of zybo cluster

    把Tachyon层加入spark和hadoop之间,以加速集群

    官网:http://tachyon-project.org/

    github:https://github.com/amplab/tachyon/releases

    (1)准备工作:

    wget http://tachyon-project.org/downloads/tachyon-0.4.1-bin.tar.gz
    tar xvfz tachyon-0.4.1-bin.tar.gz
    cd tachyon-0.4.1

    cp conf/tachyon-env.sh.template conf/tachyon-env.sh

    (2)在本地测试:

    vi conf/tachyon-env.sh

    image

    ./bin/tachyon format
    ./bin/tachyon-start.sh local
    ./bin/tachyon runTest Basic CACHE_THROUGH

    image

    image

    image

    (3)与Hadoop结合:Set HDFS as Tachyon’s under filesystem

    因为2.4.0的hadoop需要重新编译,在arm平台安装maven会出错,故转移到x64pc机编译:

    apt-get install maven

    vi pom.xml

    {`URP5$~}$M056}P20LTB`5

    mvn -Dhadoop.version=2.4.0 clean package

    image

    cp -r /root/tachyon-0.4.1 /media/fs/root/

    cd /root/tachyon-0.4.1

    image

    cd ..

    cd hadoop-2.4.0/

    vi etc/hadoop/core-site.xml

    image

    <property>
      <name>fs.tachyon.impl</name>
      <value>tachyon.hadoop.TFS</value>
    </property>

    vi etc/hadoop/hadoop-env.sh

    加入一行:

    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/root/tachyon-0.4.1/target/tachyon-0.4
    .1-jar-with-dependencies.jar

    cd /root

    ./gohadoop.sh

    cd tachyon-0.4.1

    ./bin/tachyon format

    image

    ./bin/tachyon-start.sh local
    ./bin/tachyon runTest Basic CACHE_THROUGH

    image

    cd $HADOOP_HOME
    执行如下命令:
    ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar 
    wordcount -libjars /root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar 
    tachyon://192.168.1.1:19998/in/file /out/file
    image

    (4)与Spark结合:Running Spark on Tachyon

    cd spark-0.9.1-bin-hadoop2

    vi conf/spark-env.sh

    image

    SPARK_CLASSPATH=/root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar:$SPARK_CLASSPATH
    export SPARK_CLASSPATH

    export TACHYON_MASTER="192.168.1.1:19998"

    新建一个配置文件:

    vi conf/core-site.xml

    image

    <configuration>
      <property>
        <name>fs.tachyon.impl</name>
        <value>tachyon.hadoop.TFS</value>
      </property>
    </configuration>

    运行

    MASTER=spark://192.168.1.1:7077 ./bin/pyspark
    file = sc.textFile("tachyon://192.168.1.1:19998/in/file")
    counts = file.flatMap(lambda line: line.split(" "))
                 .map(lambda word: (word, 1))
                 .reduceByKey(lambda a, b: a + b)
    counts.collect()

    counts.saveAsTextFile("tachyon://192.168.1.1:19998/out/mycount")

    counts.saveAsTextFile("hdfs://192.168.1.1:9000/out/mycount1")

    collect()正确执行,

    save to hadoop 正确执行,

    save to tachyon 后出错:

    image

    参考网站:http://tachyon-project.org/Syncing-the-Underlying-Filesystem.html

    暂未解决。

    先只测试用Tachyon读数据1G大小的文本文件:

    使用hadoop读取使用了16分钟。

    scp tachyon-0.4.1.bak2.tar.gz root@spark4:/root/

  • 相关阅读:
    带下拉子菜单的导航菜单
    如何使用myFocus插件制作焦点图效果
    将博客搬至CSDN
    《转》二进制与三进制的那些趣题
    二叉树遍历 (前序 层次 == 深度 广度) 层次遍历
    数组全排列 knuth 分解质因数
    堆排序
    双向快速排序
    二路归并排序
    字符串的排列
  • 原文地址:https://www.cnblogs.com/shenerguang/p/3836313.html
Copyright © 2011-2022 走看看