今天继续分析异常企业,我想了很久,没有想到该怎么做分辨,然后突然想到选出进出金额差很大的,然后就往这边做
Create table xf(xf_id string, je double,xnum int) Row format delimited fields terminated by ',';
Create table gf(gf_id string, je double,gnum int) Row format delimited fields terminated by ',';
建立分别只有购买,销售一方的总金额总交易数表
从增值税发票表中获取数据
insert into table xf select xf_id,sum(je),count(xf_id) from zzsfp where zfbz='N' group by xf_id;
insert into table gf select gf_id,sum(je),count(gf_id) from zzsfp where zfbz='N' group by gf_id;
- 1. 创建买入卖出金额差
Create table xc(xf_id string, profit double) Row format delimited fields terminated by ',';
insert into table xc select xf.xf_id,(xf.je-gf.je)as profit from xf join gf on xf_id=gf_id;
做出这个差额表,数据很多,hive跑这个工作跑一次要十几分钟,然后在里面找差值在什么范围内,才能接近几百的数量,然后一直试,试到两千万才缩小到三位数。
select * from xc where profit>20000000 or profit<-20000000;