SAS 中的多表使用一般为多表之间的连接查询
连接查询分为:
(1)表内连接
(2)表与表之间的连接
这里我们只讨论表与表之间的连接,而表与表之间的连接分为:
(1)全连接
这里的全连接就是基本的笛卡尔积,笛卡尔积通俗一点说就是每条数据的交叉相乘,使用full join
1 data temp; 2 input visit $ visit_dat $ age type $; 3 cards; 4 v1 20190201 18 a 5 v2 20200304 21 f 6 v3 20190825 34 e 7 v1 20180431 58 c 8 v2 20170902 23 d 9 v4 20160826 25 r 10 ; 11 run; 12 13 data temp2; 14 input visit $ visit_dat $ age type $; 15 cards; 16 v3 20190725 25 d 17 v4 20200431 35 e 18 v5 20190921 38 g 19 ; 20 run; 21 22 /* 创建表,含条件语句 */ 23 proc sql; 24 create table join_visit as 25 select 26 table1.visit as visit1, 27 table1.visit_dat as visit_dat1, 28 table1.age as age1, 29 table1.type as type1, 30 table2.visit as visit1, 31 table2.visit_dat as visit_dat2, 32 table2.age as age2, 33 table2.type as type2 34 from 35 work.temp as table1 36 full join 37 work.temp2 as table2 38 on 39 table1.visit = table2.visit; 40 run; 41 42 /* 打印创建的表 */ 43 proc print data=join_visit; 44 run;
结果为:
(2)左外连接
保留左边的所有数据,对于右边的,只保留符合条件的数据,使用 left join
1 data temp; 2 input visit $ visit_dat $ age type $; 3 cards; 4 v1 20190201 18 a 5 v2 20200304 21 f 6 v3 20190825 34 e 7 v1 20180431 58 c 8 v2 20170902 23 d 9 v4 20160826 25 r 10 ; 11 run; 12 13 data temp2; 14 input visit $ visit_dat $ age type $; 15 cards; 16 v3 20190725 25 d 17 v4 20200431 35 e 18 v5 20190921 38 g 19 ; 20 run; 21 22 /* 创建表,含条件语句 */ 23 proc sql; 24 create table join_visit as 25 select 26 table1.visit as visit1, 27 table1.visit_dat as visit_dat1, 28 table1.age as age1, 29 table1.type as type1, 30 table2.visit as visit1, 31 table2.visit_dat as visit_dat2, 32 table2.age as age2, 33 table2.type as type2 34 from 35 work.temp as table1 36 left join 37 work.temp2 as table2 38 on 39 table1.visit = table2.visit; 40 run; 41 42 /* 打印创建的表 */ 43 proc print data=join_visit; 44 run;
(3)右外连接
保留右边的所有数据,对于左边的,只保留符合条件的数据,使用 right join
data temp; input visit $ visit_dat $ age type $; cards; v1 20190201 18 a v2 20200304 21 f v3 20190825 34 e v1 20180431 58 c v2 20170902 23 d v4 20160826 25 r ; run; data temp2; input visit $ visit_dat $ age type $; cards; v3 20190725 25 d v4 20200431 35 e v5 20190921 38 g ; run; /* 创建表,含条件语句 */ proc sql; create table join_visit as select table1.visit as visit1, table1.visit_dat as visit_dat1, table1.age as age1, table1.type as type1, table2.visit as visit1, table2.visit_dat as visit_dat2, table2.age as age2, table2.type as type2 from work.temp as table1 right join work.temp2 as table2 on table1.visit = table2.visit; run; /* 打印创建的表 */ proc print data=join_visit; run;
结果为:
注意:
要注意的是,多表连接使用的时候,重要的不是语句的使用,而是选择哪一个连接,才是最重要的。