zoukankan      html  css  js  c++  java
  • Hbase小结

    (一).Hbase基本介绍

    1.hbase是建立的hdfs之上,提供高可靠性、高性能、列存储、可伸缩、实时读写的数据库系统

    2.hbase特点:
      HBase中的存储一切皆是字节
      HBase的RowKey会按照字节顺序排序,并且添加索引
      HBase会按照row数量自动切割成Region,保持负载均衡与冗余

    3.hbase存储结构:
      RowKey:是Byte array,是表中每条记录的“主键”,方便快速查找,Rowkey的设计非常重要;
      Column Family:列族,拥有一个名称(string),包含一个或者多个相关列;同一列族下的列具有相同的属性
      Column:属于某一个columnfamily,familyName:columnName,每条记录可动态添加;
      Cell:其中timestamp是时间戳,value是rowkey对应列的值

      hbase(main):009:0> scan 'User'

      ROW                                       COLUMN+CELL

      id001 column=personInfo:name, timestamp=1502368030841, value=xiaoming
      id001 column=personInfo:age, timestamp=1502368069926, value=18
      id001 column=personInfo:sex, timestamp=1502368093636, value=man

    (二).Hbase常用命令

    1.进入shell: hbase shell

    [hadoop@indb-3-136-hzifc bin]$ echo $HBASE_HOME

    /data/program/hbase

    [hadoop@indb-3-136-hzifc bin]$ /data/program/hbase/bin/hbase shell


    2.查看所有表: list

    hbase(main):003:0> list
    T
    TABLE
    S
    SYSTEM.CATALOG
    S
    SYSTEM.FUNCTION
    S
    SYSTEM.SEQUENCE
    S
    SYSTEM.STATS
    T
    TEST.USER
    U
    User

    6 row(s) in 0.0340 seconds


    3.查看某个表详情: describe

    hbase(main):004:0> describe 'User'
    T
    Table User is ENABLED
    U
    User
    C
    COLUMN FAMILIES DESCRIPTION
    {
    {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',

    DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE
    V
    VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>

    '0'}

    1 row(s) in 0.1410 seconds



    4.创建表: create

    语法:create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
    创建一个User表,可以一个或多个info列族

    hbase(main):002:0> create 'User','info1'
    0 row(s) in 1.5890 seconds


    5.删除指定的列族: delete

    语法: alter 表名,'delete' =>'列族'

    hbase(main):002:0> alter 'User', 'delete' => 'info'
    U
    Updating all regions with the new schema...

    1/1 regions updated.
    D
    Done.

    0 row(s) in 2.5340 seconds



    6.插入数据: put

    语法:put <table>,<rowkey>,<family:column>,<value>

    hbase(main):005:0> put 'User', 'row1', 'info:name', 'xiaoming'
    0 row(s) in 0.1200 seconds

    hbase(main):006:0> put 'User', 'row2', 'info:age', '18'
    0 row(s) in 0.0170 seconds

    hbase(main):007:0> put 'User', 'row3', 'info:sex', 'man'
    0 row(s) in 0.0030 seconds




    7.根据rowKey查询某个记录: get

    语法:get <table>,<rowkey>,[<family:column>,....]

    hbase(main):008:0> get 'User', 'row2'

    COLUMN CELL

    info:age timestamp=1502368069926, value=18
    1 row(s) in 0.0280 seconds

    hbase(main):028:0> get 'User', 'row3', 'info:sex'

    COLUMN CELL

    info:sex timestamp=1502368093636, value=man

    hbase(main):036:0> get 'User', 'row1', {COLUMN => 'info:name'}

    COLUMN CELL

    info:name timestamp=1502368030841, value=xiaoming

    1 row(s) in 0.0120 seconds



    8.查询所有记录: scan

    语法:scan <table>, {COLUMNS => [ <family:column>,.... ], LIMIT => num}

    扫描所记录
    hbase(main):009:0> scan 'User'

    ROW COLUMN+CELL

    row1 column=info:name, timestamp=1502368030841, value=xiaoming

    row2 column=info:age, timestamp=1502368069926, value=18
    row3 column=info:sex, timestamp=1502368093636, value=man

    3 row(s) in 0.0380 seconds

    扫描前2条
    hbase(main):037:0> scan 'User', {LIMIT => 2}
    R
    ROW COLUMN+CELL

    row1 column=info:name, timestamp=1502368030841, value=xiaoming

    row2 column=info:age, timestamp=1502368069926, value=18
    2 row(s) in 0.0170 seconds

    范围查询
    hbase(main):011:0> scan 'User', {STARTROW => 'row2'}
    R
    ROW COLUMN+CELL

    row2 column=info:age, timestamp=1502368069926, value=18
    row3 column=info:sex, timestamp=1502368093636, value=man

    2 row(s) in 0.0170 seconds

    hbase(main):012:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row2'}
    R
    ROW COLUMN+CELL

    row2 column=info:age, timestamp=1502368069926, value=18
    1 row(s) in 0.0110 seconds

    hbase(main):013:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'}
    R
    ROW COLUMN+CELL

    row2 column=info:age, timestamp=1502368069926, value=18
    1 row(s) in 0.0120 seconds

    另外,还可以添加TIMERANGE和FITLER等高级功能
    STARTROW,ENDROW必须大写,否则报错;查询结果不包含等于ENDROW的结果集

    9.统计表记录数: count

    语法:count <table>, {INTERVAL => intervalNum, CACHE => cacheNum}

    INTERVAL设置多少行显示一次及对应的rowkey,默认1000;CACHE每次去取的缓存区大小,默认是10,调整该参数可提高查询速度
    hbase(main):020:0> count 'User'
    3 row(s) in 0.0360 seconds




    10.删除: delete

    删除列
    hbase(main):008:0> delete 'User', 'row1', 'info:age'
    0 row(s) in 0.0290 seconds

    删除所行
    hbase(main):014:0> deleteall 'User', 'row2'
    0 row(s) in 0.0090 seconds

    清空表中所有数据
    hbase(main):016:0> truncate 'User'
    T
    Truncating 'User' table (it may take a while):

    - Disabling table...

    - Truncating table...

    0 row(s) in 3.6610 seconds


    11.查看表是否存在: exists

    hbase(main):022:0> exists 'User'
    T
    Table User does exist

    0 row(s) in 0.0150 seconds


    12.禁用表: disable

    hbase(main):014:0> disable 'User'
    0 row(s) in 2.2660 seconds



    13.启用表: enable

    hbase(main):017:0> enable 'User'
    0 row(s) in 1.3470 seconds



    14.删除表: drop

    删除前,必须先disable

    hbase(main):031:0> disable 'TEST.USER'
    0 row(s) in 2.2640 seconds
    hbase(main):033:0> drop 'TEST.USER'
    0 row(s) in 1.2490 seconds

    (三).scala操作hbase的api

    import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor,HBaseConfiguration,TableName}
    import org.apache.hadoop.hbase.client.{ConnectionFactory,Put,Get,Delete,Scan}
    import org.apache.hadoop.hbase.util.Bytes
    import scala.collection.JavaConversions._
    import java.util
    
    
    
    val conf=HBaseConfiguration.create()
    //Connection 的创建是个重量级的工作,线程安全,是操作hbase的入口
    val conn=ConnectionFactory.createConnection(conf)
    //从Connection获得 Admin 对象(相当于以前的 HAdmin)
    val admin=conn.getAdmin
    //本例将操作的表名
    val userTable=TableName.valueOf("user_score_table")
    
    
    val cf1="scoreInfo"
    val cf2="addressInfo"
    val cn1="math"
    val cn2="physics"
    val cn3="Addr"
    
    
    if(admin.tableExists(userTable)){
      println("Table exists!")
      //admin.disableTable(userTable)
      //admin.deleteTable(userTable)
      //exit()
    }else{
      val tableDesc=new HTableDescriptor(userTable)
      tableDesc.addFamily(new HColumnDescriptor("scoreInfo".getBytes))
      tableDesc.addFamily(new HColumnDescriptor("addressInfo".getBytes))
      admin.createTable(tableDesc)
      println("Create table success!")
    }
    
    
    
    //插入一条rowkey 为 IromMan 的数据
    val p=new Put("IromMan".getBytes())
    //为put操作指定 column 和 value (以前的 put.add 方法被弃用了)
    p.addColumn(cf1.getBytes,cn1.getBytes,"98".getBytes) // scoreInfo:math  98
    p.addColumn(cf1.getBytes,cn2.getBytes,"87".getBytes) // scoreInfo:physics  87
    p.addColumn(cf2.getBytes,cn3.getBytes,"Beijing".getBytes) // addressInfo
    table.put(p)
    
    
    //按rowkey查询数据
    val listGet=new util.ArrayList[Get]
    val get=new Get(Bytes.toBytes("id002_Thor"))
    val get2=new Get(Bytes.toBytes("id003_jack"))
    listGet.add(get)
    listGet.add(get2)
    val resultArr=myTable.get(listGet).flatMap(z=>{
      val cellArr=z.rawCells()
      val valueArr=cellArr.map(n=>(Bytes.toString(z.getRow()),(Bytes.toString(CellUtil.cloneQualifier(n)),Bytes.toString(CellUtil.cloneValue(n)))))
      valueArr
    })
    
    
    userTable.close()
    conn.close()
  • 相关阅读:
    这些git技能够你用一年了
    “SSLError: The read operation timed out” when using pip
    Python字符串格式化
    python chardet简单应用
    Python中文字符串截取
    Python time datetime常用时间处理方法
    Python 拷贝对象(深拷贝deepcopy与浅拷贝copy)
    我的Linux随笔目录
    Debian修改ssh端口和禁止root远程登陆设置
    Linux开机启动
  • 原文地址:https://www.cnblogs.com/ShyPeanut/p/11265075.html
Copyright © 2011-2022 走看看