Hive count 多个度量指标,带有 distinct ,注意点!!!
比如 select organid, ppi, count(id1) as num1, count(distinct id2) as num 2 from table group by organid, ppi这样的 SQL 语句,在hive里面执行操作,会导致 num1 的 数值可能存在误差!!!!
在生产环境中,不建议count 多个度量指标,带有 distinct,这样写SQL X X X
比较好的实现 SQL 是 两次 group by 实现
select t.organid,t.ppi, sum(t.num) as num1, count(t.id2) as num2 from ( select organid,ppi, id2, count(id1) as num from table group by organid,id2,ppi) t
group by t.organid,t.ppi