hbase 聚合操作

zoukankan html css js c++ java

hbase 聚合操作
hbase本身提供了聚合方法可以服务端聚合操作

hbase中的CoprocessorProtocol机制.

CoprocessorProtocol的原理比较简单，近似于一个mapreduce框架。由client将scan分解为面向多个region的请求，并行发送请求到多个region，然后client做一个reduce的操作，得到最后的结果。

先看一个例子，使用hbase的AggregationClient可以做到简单的面向单个column的统计。
Java代码

@Test

public void testAggregationClient() throws Throwable {



    LongColumnInterpreter columnInterpreter = new LongColumnInterpreter();



    AggregationClient aggregationClient = new AggregationClient(

            CommonConfig.getConfiguration());

    Scan scan = new Scan();



    scan.addColumn(ColumnFamilyName, QName1);



    Long max = aggregationClient.max(TableNameBytes, columnInterpreter,

            scan);

    Assert.assertTrue(max.longValue() == 100);



    Long min = aggregationClient.min(TableNameBytes, columnInterpreter,

            scan);

    Assert.assertTrue(min.longValue() == 20);



    Long sum = aggregationClient.sum(TableNameBytes, columnInterpreter,

            scan);

    Assert.assertTrue(sum.longValue() == 120);



    Long count = aggregationClient.rowCount(TableNameBytes,

            columnInterpreter, scan);

    Assert.assertTrue(count.longValue() == 4);



}
看下hbase的源码。AggregateImplementation
Java代码

@Override

  public <T, S> T getMax(ColumnInterpreter<T, S> ci, Scan scan)

      throws IOException {

    T temp;

    T max = null;

    InternalScanner scanner = ((RegionCoprocessorEnvironment) getEnvironment())

        .getRegion().getScanner(scan);

    List<KeyValue> results = new ArrayList<KeyValue>();

    byte[] colFamily = scan.getFamilies()[0];

    byte[] qualifier = scan.getFamilyMap().get(colFamily).pollFirst();

    // qualifier can be null.

    try {

      boolean hasMoreRows = false;

      do {

        hasMoreRows = scanner.next(results);

        for (KeyValue kv : results) {

          temp = ci.getValue(colFamily, qualifier, kv);

          max = (max == null || (temp != null && ci.compare(temp, max) > 0)) ? temp : max;

        }

        results.clear();

      } while (hasMoreRows);

    } finally {

      scanner.close();

    }

    log.info("Maximum from this region is "

        + ((RegionCoprocessorEnvironment) getEnvironment()).getRegion()

            .getRegionNameAsString() + ": " + max);

    return max;

  }
这里由于
Java代码

byte[] colFamily = scan.getFamilies()[0];

byte[] qualifier = scan.getFamilyMap().get(colFamily).pollFirst();
所以，hbase自带的Aggregate函数，只能面向单列进行统计。

当我们想对多列进行Aggregate，并同时进行countRow时，有以下选择。
1 scan出所有的row，程序自己进行Aggregate和count。
2 使用AggregationClient，调用多次，得到所有的结果。由于多次调用，有一致性问题。
3 自己扩展CoprocessorProtocol。

这个是github的hbase集成插件

这个功能集成到simplehbase里面了。
https://github.com/zhang-xzhi/simplehbase
查看全文

相关阅读:
20191028 Codeforces Round #534 (Div. 1)
20191028 牛客网CSP-S Round2019-1
UVA11464 Even Parity 搜索+递推
 CSP2019-S1 游记
 LG2921 [USACO2008DEC]Trick or Treat on the Farm 内向基环树
 bzoj 2002 Bounce 弹飞绵羊
 快速乘，快速幂，十进制快速幂，矩阵快速幂
 2019牛客暑期多校训练营(第五场) generator 1
2019牛客暑期多校训练营(第五场) digits 2
hdu 4714 Tree2cycle

原文地址：https://www.cnblogs.com/yaohaitao/p/6789113.html