zoukankan      html  css  js  c++  java
  • Installation of CarbonData 1.1.0 with Spark 1.6.2

    关键词:carbondata spark thrift 数据仓库

    【Install thrift 0.9.3】

    注意 要装thrift-java必须先装ant 。

    有人说要装boost,我在centos6上没有装一样可以运行,猜测可能是c/cpp需要,java/python的不需要

    thrift安装包可以在thrift官网下载,注意版本,手动下载地址:http://www.apache.org/dyn/closer.cgi?path=/thrift/0.9.3

    sudo yum -y install ant libevent-devel zlib-devel openssl-devel
    
    # Install bison
    wget http://ftp.gnu.org/gnu/bison/bison-2.5.1.tar.gz
    tar xvf bison-2.5.1.tar.gz
    cd bison-2.5.1
    ./configure --prefix=/usr
    make
    sudo make install
    cd ..
    
    # Install libevent
    wget --no-check-certificate https://github.com/libevent/libevent/releases/download/release-2.0.22-stable/libevent-2.0.22-stable.tar.gz -O libevent-2.0.22-stable.tar.gz
    tar -xzvf libevent-2.0.22-stable.tar.gz
    cd libevent-2.0.22-stable
    ./configure --prefix=/usr
    make
    sudo make install
    cd ..
    
    # Install thrift
    wget http://apache.parentingamerica.com/thrift/0.9.3/thrift-0.9.3.tar.gz
    tar -xzvf thrift-0.9.3.tar.gz cd thrift-0.9.3 ./configure --prefix=/usr --with-libevent=/usr --with-java sudo make sudo make install cd ..

    如果是其他语言的,首先得安装该语言的环境和其他相关的库。Java的需要jdk和ant。

    【Package and Install CarbonData】

    参考:https://github.com/apache/carbondata/tree/master/build

    下载 carbondata 1.1.0,解压后在carbondata源码目录下执行 (同理其他spark版本改下profile和spark.version的参数即可)

    mvn -DskipTests -Pspark-1.6 -Dspark.version=1.6.2 clean package

    maven下载速度慢的,可以用aliyun mirror替代apache central,修改 ~/.m2/settings.xml。

    <settings>
      ...
      <mirrors>
        <mirror>
          <id>alimaven</id>
          <name>aliyun maven</name>
          <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
          <mirrorOf>central</mirrorOf>
        </mirror>
      </mirrors>
      ...
    </settings>

    【Run carbondata in spark-shell】

    参考:http://carbondata.apache.org/quick-start-guide.html

    准备数据文件

    # in linux , prepare data example file
    cd carbondata
    cat > sample.csv << EOF
    id,name,city,age
    1,david,shenzhen,31
    2,eason,shenzhen,27
    3,jarry,wuhan,35
    EOF
    
    hdfs dfs -put sample.csv /tmp/

    准备assembly jar包

    # in linux, copy assembly jar to a lib directory
    cd $CARBONDATA_HOME
    mkdir -p lib
    cp assembly/target/scala-2.10/carbondata_2.10-1.1.0-shade-hadoop2.2.0.jar lib/
    cp integration/spark/target/carbondata-spark-1.1.0.jar lib/

    run spark in shell mode

    spark-shell --jars $CARBONDATA_HOME/lib/carbondata_2.10-1.1.0-shade-hadoop2.2.0.jar,$CARBONDATA_HOME/lib/carbondata-spark-1.1.0.jar

    SparkShell > 

    // in spark shell, cluster mode
    import org.apache.spark.sql.CarbonContext
    
    // remember to add hdfs:// if you want to use hdfs mode.
    val cc = new CarbonContext(sc, "hdfs:///tmp/carbon/data/")
    cc.sql("CREATE TABLE IF NOT EXISTS hdfs_sample ( id string, name string, city string, age Int) STORED BY 'carbondata'")
    cc.sql("LOAD DATA INPATH 'hdfs:///tmp/sample.csv' INTO TABLE hdfs_sample")
    cc.sql("SELECT * FROM hdfs_sample").show()
    cc.sql("SELECT city, avg(age), sum(age) FROM hdfs_sample GROUP BY city").show()
  • 相关阅读:
    python3去除字符串中括号及括号里面的内容
    [机器学习]-朴素贝叶斯-最简单的入门实战例子
    [机器学习]-K近邻-最简单的入门实战例子
    [机器学习]-决策树-最简单的入门实战例子
    python3 通过qq邮箱定时发送邮件
    [代码仓库]Python3多线程编程
    [代码仓库]python常用散列函数
    [代码仓库]RSA算法
    [代码仓库]DES加密算法
    [学习记录]tensorflow超简单步骤使用
  • 原文地址:https://www.cnblogs.com/lhfcws/p/7161490.html
Copyright © 2011-2022 走看看