zoukankan      html  css  js  c++  java
  • Spark对HBase读写数据

    HBase安装

    在HBase学习中有安装指导博客

    创建HBase表

    启动Hadoop、Spark

    //启动hadoop
    ./sbin/start-dfs.sh
    //启动HBase
    ./bin/start-hbase.sh
    ./bin/hbase shell
    

    创建表

    //先删除student表
    disable 'student'
    drop 'student'
    
    //建表,表名student,列族info,列限定符name,gender,age
    create 'student','info'
    

    插入数据

    //插入数据,'1'为行键
    put 'student','1','info:name','Xueqian'
    put 'student','1','info:gender','F'
    put 'student','1','info:age','23'
    
    put 'student','2','info:name','Weiliang'
    put 'student','2','info:gender','M'
    put 'student','2','info:age','24'
    

    Spark配置

    下载jar包

    把hbase/lib下的jar包拷贝到spark/jars目录下。

    拷贝的jar包有:hbase*.jar,guava-12.0.1.jar,htrace-core-3.1.0-incubating.jar,protobuf-java-2.5.0.jar

    需要访问[https://mvnrepository.com/artifact/org.apache.spark/spark-examples_2.11/1.6.0-typesafe-001]网站下载spark-examples_2.11/1.6.0-typesafe-001.jar,后保存在spark/jars

    配置spark-env.sh文件
    cd /usr/local/spark/conf
    sudo gedit spark-env.sh
    
    //添加内容
    export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath):$(/usr/local/hbase/bin/hbase classpath):/usr/local/spark/jars/hbase/*
    

    编写程序读写HBase数据

    读取数据

    用SparkContext提供的newAPIHadoopRDD API将表的内容以RDD形式加载到Spark中

    • 读取HBase数据代码
    from pyspark import SparkConf,SparkContext
    conf = SparkConf().setMaster("local").setAppName("ReadHBase")
    sc = SparkContext(conf = conf)
    host = 'localhost'
    table = 'student'
    conf = {"hbase.zookeeper.quorum":host,"hbase.mapreduce.inputtable":table}
    keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
    valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
    hbase_rdd = sc.newAPIHadoopRDD("org.apache.hadoop.hbase.mapreduce.TableInputFormat","org.apache.hadoop.hbase.io.ImmutableBytesWritable","org.apache.hadoop.hbase.client.Result",keyConverter=keyConv,valueConverter=valueConv,conf=conf)
    count = hbase_rdd.count()
    hbase_rdd.cache()
    output = hbase_rdd.collect()
    for(k,v) in output:
        print(k,v)
    
    • spark-submit提交代码
    /usr/local/spark/bin/spark-submit SparkOperateHBase.py
    
    写入数据
    • 写入HBase数据代码
    from pyspark import SparkConf,SparkContext
    
    conf = SparkConf().setMaster("local").setAppName("WriteHBase")
    sc = SparkContext(conf = conf)
    host = "localhost"
    table = "student"
    keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter"
    valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter"
    conf = {"hbase.zookeeper.quorum":host,"hbase.mapred.outputtable":table,"mapreduce.outputformat.class":"org.apache.hadoop.hbase.mapreduce.TableOutputFormat","mapreduce.job.output.key.class":"org.apache.hadoop.hbase.io.ImmutableBytesWritable","mapreduce.job.output.value.class":"org.apache.hadoop.io.Writable"}
    rawData = ['3,info,name,Rongcheng','3,info,gender,M','3,info,age,26','4,info,name,Guanhua','4,info,gener,M','4,info,age,27']
    sc.parallelize(rawData).map(lambda x: (x[0],x.split(','))).saveAsNewAPIHadoopDataset(conf=conf,keyConverter=keyConv,valueConverter=valueConv)
    
  • 相关阅读:
    Smali基本语法
    图片智能缩小
    How to install ia32-libs in Ubuntu 14.04 LTS (Trusty Tahr)
    [操作系统][Ubuntu 14.04] 安装Flash 安装QQ2013
    eclipse在Ubuntu 13.04下的安装过程及问题小记
    Android系统手机端抓包方法
    Android 开源框架ActionBarSherlock 和 ViewPager 仿网易新闻客户端
    试用Android Annotations
    Android Annotations 介绍
    盘点国内Android移动广告平台的现状
  • 原文地址:https://www.cnblogs.com/chenshaowei/p/12433122.html
Copyright © 2011-2022 走看看