zoukankan      html  css  js  c++  java
  • 《HBase 权威指南》学习笔记一 引言

    解决数据库多写问题,同事推荐使用hbase,并做了HBase培训,也看到老大tim参会说淘宝用hbase替代部分mysql核心应用,学习研究下看是否适用

    分布式计算的谬论.:

    1 The network is reliable.
    2 Latency is zero.
    3 Bandwidth is infinite.
    4 The network is secure.
    5 Topology doesn't change.
    6 There is one administrator.
    7 Transport cost is zero.
    8 The network is homogeneous.

     下载版本0.92.1  889个文件  285749 行java代码(find . -name '*.java'|wc -l)

    《HBase 权威指南》目录摘要:

    1. hbase演进

      November 2006
      Google releases paper on BigTable
      February 2007
      Initial HBase prototype created as Hadoop contrib§
      October 2007
      First “usable” HBase (Hadoop 0.15.0)
      January 2008
      Hadoop becomes an Apache top-level project, HBase becomes subproject
      October 2008
      HBase 0.18.1 released
      January 2009
      HBase 0.19.0 released
      September 2009
      HBase 0.20.0 released, the performance release
      May 2010
      HBase becomes an Apache top-level project
      June 2010
      HBase 0.89.20100621, first developer release
      January 2011
      HBase 0.90.0 released, the durability and stability release
      Mid 2011
      HBase 0.92.0 released, tagged as coprocessor and security release

    2. rdbms的局限性
      举例“Hush, the HBase URL Shortener”这个应用,随访问量增大要加slave,加cache,只能做简单查询,考虑读写的不断优化和扩展,分表分库,在应用层面改程序,做sharding,买好的硬件,以及随后的不尽噩梦。
    3. HBase的面向column的表

      the most basic unit is a column. One or more columns form a
      row that is addressed uniquely by a row key. A number of rows, in turn, form a table,
      and there can be many of them. Each column may have multiple versions, with each
      distinct value contained in a separate cell.
      (Table, RowKey, Family, Column, Timestamp) → Value  可在编程语言中表达为:

      SortedMap<RowKey, List<SortedMap<Column, List<Value, Timestamp>>>>   (p19)
      相同rowkey会有不同时间戳的数据,对应不同的版本,数据存储在HFiles中,索引保存在内存中,默认64KB,HFiles又被保存在Hadoop Distributed File System(hdfs)中,确保在跨服务器的数据写入不会丢失。索引存储在文件块的最后面.

    4. HBase的anto-sharding

      region去管理监控做sharding。“Each region is served by exactly one region server, and each of these servers can serve

      many regions at any time"

    5. 数据写入流程

      When data is updated it is first written to a commit log, called a write-ahead log (WAL)
      in HBase, and then stored in the in-memory memstore. Once the data in memory has
      exceeded a given maximum value, it is flushed as an HFile to disk. After the flush, the
      commit logs can be discarded up to the last unflushed modification. While the system
      is flushing the memstore to disk, it can continue to serve readers and writers without
      having to block them.  

      Since flushing memstores to disk causes more and more HFiles to be created, HBase
      has a housekeeping mechanism that merges the files into larger ones using compaction.
      There are two types of compaction: minor compactions and major compactions.(p24)

    6. HBase组成部分
      the client library, one master server, and many region servers.HBase master server 使用zookeeper管理region servers,负载均衡,去掉繁忙服务器。hbase相比google bigtable,增加了" push-down predicates, that is, filters,

      reducing data transferred over the network"


  • 相关阅读:
    Java变量以及内存分配
    在ORACLE存储过程中创建临时表
    CREATE OR REPLACE FUNCTION
    DECLARE
    CURSOR
    STM32WB SRAM2
    git版本控制
    STM32WB HSE校准
    STM32 HSE模式配(旁路模式、非旁路模式)
    STM32WB 信息块之OTP
  • 原文地址:https://www.cnblogs.com/shenguanpu/p/2521259.html
Copyright © 2011-2022 走看看