首先我们先明确一下sql语句的执行顺序,如下有前至后执行:
(1)from (2) on (3) join (4) where (5)group by (6) avg,sum... (7)having (8) select (9) distinct (10) order by
1、独立子查询
独立子查询:顾名思义:就是子查询和外层查询不存在任何联系,是独立于外层查询的:
下面就看一个例子:
有一张订单表 Sales.Order 和一张 客户表
Sales.Customer
下面的sql 语句是为了查询出Sales.Customer里 custid(用户id)不在
Sales.Order 的custid
select custid from [Sales.Customers] where custid not in ( select custid from [Sales.Orders] )
2、相关子查询
顾名思义:就是子查询里面的条件依赖于外层查询的数据
下面我再来举一个例子:
业务要求:查询出每个客户的订单的数量:
select distinct custid, ( select COUNT(*) from [Sales.Orders] --相关子查询:依赖于外层查询结果;;是外层和内层相互结合的操作 where [Sales.Orders].custid=[Sales.Customers].custid ) as orderNum from [Sales.Customers]
查询的结果:
所以我们不难看出:相关子查询比独立子查询实现的功能强大的多
但是复杂度也会有提升,相当于笛卡尔积。
3、案例演示
一个我之前开发时候写过的错误案例
SELECT DISTINCT m.id AS PolicyId, m.PolicyNo, m.PolicyType, m.PolicyStartTime, m.PolicyEndTime, m.BelogOrganizationName, CASE ( SELECT count( 0 ) FROM policy_people AS b WHERE b.ProposalNo = m.ProposalNo AND b.Relation = 2 ) WHEN 1 THEN ( SELECT GROUP_CONCAT( b.`Name` ) FROM policy_people AS b WHERE b.ProposalNo = m.ProposalNo AND b.Relation = 2 ) ELSE concat( ( SELECT b.NAME FROM policy_people AS b WHERE b.ProposalNo = m.ProposalNo AND b.Relation = 2 LIMIT 1 ),",",(SELECT b.NAME FROM policy_people AS b WHERE b.ProposalNo = m.ProposalNo AND b.Relation = 2 LIMIT 1,1 ),'等' ) END AS InsuredPeople, ( SELECT count( 0 ) FROM report WHERE report.ProposalNo = m.ProposalNo ) AS CompenCount FROM policy_people AS p INNER JOIN policy_index AS m ON p.ProposalNo = m.ProposalNo WHERE p.Relation = 1 AND p.CertificateNo = "57016086-6
注意这个语句中的子查询,CASE嵌套了一层子查询,之后还用后面的逻辑继续执行子查询。再加上使用了distinct ,很容易造成选出过多的数据,执行过多的无效子查询,最后被distinct筛选掉。
暴力测试数据量上w之后,上面的语句执行时间长达几十秒。
优化之后的修改,首先时使用group by代替了distinct,因为sql执行顺序的问题。distinct的级别很低,会造成先去选取数据,然后再去distinct,所以尽量去用group by而不是distinct。
然后将子查询中过多的逻辑放到逻辑层去解决,而不是再sql中强行使用。
优化后的sql
SELECT m.ProposalNo, ANY_VALUE ( m.PolicyNo ) AS PolicyNo, ANY_VALUE ( m.PolicyType ) AS PolicyType, ANY_VALUE ( m.PolicyStartTime ) AS PolicyStartTime, ANY_VALUE ( m.PolicyEndTime ) AS PolicyEndTime, ANY_VALUE ( m.BelogOrganizationName ) AS BelogOrganizationName, d.InsuredPeople, d.InsuredPeolpleCount, ( SELECT count( 0 ) FROM report WHERE report.ProposalNo = p.ProposalNo ) AS CompenCount FROM policy_people AS p INNER JOIN policy_index AS m ON p.ProposalNo = m.ProposalNo INNER JOIN (SELECT GROUP_CONCAT( b.NAME ) AS InsuredPeople,count( 1 ) AS InsuredPeolpleCount,b.ProposalNo FROM policy_people AS b WHERE b.Relation = 2 GROUP BY b.ProposalNo ) AS d ON p.ProposalNo = d.ProposalNo WHERE p.Relation = 1 AND p.CertificateNo = ? certificateNo AND m.PolicyStartTime >? YEAR GROUP BY p.ProposalNo LIMIT ? pagebegin,? pageend
使用了三表联查,减少了子查询,效率大幅提升,目前已经可以满足几十万级别的数据量,等项目运作一段时间后,再继续考虑是否进一步优化。