HBase - 走看看

zoukankan html css js c++ java

HBase
- https://wiki.mozilla.org/Socorro:HBase
- http://blog.cloudera.com/blog/2011/02/log-event-processing-with-hbase/
- Column families
  
  Example: A common column family Socorro uses is "ids:" and a common column qualifier in that family is "ids:ooid". Another column is "ids:hang"
  
  The table schema enumerates the column families that are part of it. The column family contains metadata about compression, number of value versions retained, and caching.
  
  A column family can store tens of thousands of values with different column qualifier names.
  
  Retrieving data from multiple column families requires at least one block access (disk or memory) per column family. Accessing multiple columns in the same family requires only one block access.
  
  If you specify just the column family name when retrieving data, the values for all columns in that column family will be returned.
  
  If a record does not contain a value for a particular column in a set of columns you query for, there is no "null", there just isn't an entry for that column in the returned row.
- Manipulating a row
  
  All manipulations are performed using a rowkey.
  
  Setting a column to a value will create the row if it doesn't exist or update the column if it already existed.
  
  Deleting a non-existent row or column is a no-op.
  
  Counter column increments are atomic and very fast. StumbleUpon has some counters that they increment hundreds of times per second.
- Tables are always ordered by their rowkeys
  
  Scanning a range of a table based on a rowkey prefix or a start and end range is fast.
  
  Retrieving a row by its key is fast.
  
  Searching for a row requires a rowkey structure that you can easily do a range scan on, or a reverse index table.
  
  A full scan on a table that contains billions of items is slow (although, unlike an RDBMS it isn't likely to cause performance problems)
  
  If you are continually inserting rows that have similar rowkey prefixes, you are beating up on a single RegionServer. In excess, it is unpleasant.
查看全文

相关阅读:
20034 #917
loj3066
P5391
Hive on spark和Hive on mr在处理orc格式表格时数据不一致问题探究
 解决自定义Spark的jar包提交到yarn上使用cluster模式执行时报错keberos用户找不到问题
 三个文件教你写一个命令行终端[electron实战]
按钮点击防止双击
 uniapp-base64加密解密（不会中文乱码）
Android Studio 2021.3.1 阿里云盘分享地址
 Oracle——创建多个实例（数据库）、切换实例、登录数据库实例

原文地址：https://www.cnblogs.com/diyunpeng/p/3910979.html