zoukankan html css js c++ java

Hive查询优化～布隆过滤器使用

技术说明：http://lxw1234.com/archives/2016/04/632.htm

hive表是orc 存储

本文优化方法：使用 bloom filter 和二级动态分区

实操：

　　1，建表：

CREATE TABLE test(
    mall_id bigint COMMENT '店铺id',
    mall_collection_id bigint COMMENT '商家包id',
    city_id bigint COMMENT '城市id', 
    city_name string COMMENT '城市名称',
    province_id bigint COMMENT '省份id',
    province_name string COMMENT '省份',
    is_illegal bigint COMMENT '是否违规',
    stat_day string COMMENT '统计时间'
)
COMMENT 'XXXX'
PARTITIONED BY ( 
  pt string COMMENT '分区日期',
  mall_col_id bigint COMMENT 'id')
    STORED AS ORC
TBLPROPERTIES
('orc.compress'='SNAPPY',
'orc.create.index'='true',
"orc.bloom.filter.columns"="mall_collection_id,stat_day", -- 这样建索引原因是接口用这两个查询数据
'orc.bloom.filter.fpp'='0.05',
'orc.stripe.size'='10485760',
'orc.row.index.stride'='10000') 
;

2，数据插入结果表：

INSERT OVERWRITE TABLE test PARTITION(pt = '${env.YYYYMMDD}', mall_col_id)
SELECT
    mall_id,
    mall_collection_id,
    city_id,
    city_name,
    province_id,
    province_name,
    is_illegal,
    stat_day,
    mall_collection_id % 1000 as mall_col_id
from
    A
DISTRIBUTE BY mall_collection_id SORT BY mall_collection_id,stat_day -- 这里和索引保持一致
;

因为bloom filter 可以过滤无效的数据，减少数据的扫描

查看全文

相关阅读:
Js事件触发列表与解说(转)
HTTP 头部解释,HTTP 头部详细分析,最全HTTP头部信息
 【转】php 数组编码转换
 Cookie 的规范介绍
 PHP底层的运行机制与原理
 PHP为什么会被认为是草根语言？
asp.net中读取带有加号（+）的Cookie，会自动把加号替换为空格
 git克隆某个分支到本地指定的目录中
 yq（json,yaml）格式转换工具安装和使用
 k8s中configmap的作用及使用方式

原文地址：https://www.cnblogs.com/pengpenghuhu/p/14465830.html