zoukankan      html  css  js  c++  java
  • SQL-一道特殊的字符串分解题目

    本题不是一道直接的字符串拆解,

    应用场景如下,表中有一个字段,是表示事件受影响的国家集合,使用逗号进行分隔,不幸的是,居然发现有些国家本身就带有逗号,这样在规范化的时候,如何准确地找到这些国家呢?

    以下的代码是有一定限制的。但基本上够用。

    下面的代码使用到了分析函数lag和lead还有cte,sqlserver2012及其以后的版本都支持,oracle好像10g以上就支持了。

    主要思路:

    字符串的分解,可以使用数字辅助表,然后cross join刷副本,然后根本分隔符出现的位置然后切豁字符串拆解到我们需要的东东。(解决方案中我使用的递归CTE来处理找到对应的位置)

    现在还需要多加一步,就是对拆解的部分进行验证和去重不符合要求的那一部。

    使用LAG和LEAD的好处,就是不需要再用自连接去找到对应的下一条数据了。

    本题的解题原则是如何长项能连接到正确的国家,则取长项的,否则取短项的。

    代码如下:

     --准备示例表与数据
    
    drop table my_countries;
    
    drop table valid_country;
    
       
    
    create table my_countries(rid int,country_name_cc varchar(200));
    
    insert into my_countries(rid,country_name_cc) values(1,'china,test, public of');
    
    insert into my_countries(rid,country_name_cc) values(2,'us, public of,china,Evan, public of');
    
       
    
    create table valid_country(cid int, country_name varchar(30));
    
    insert into valid_country(cid,country_name) values(1,'china');
    
    insert into valid_country(cid,country_name) values(2,'test, public of');
    
    insert into valid_country(cid,country_name) values(3,'Evan, public of');
    
    insert into valid_country(cid,country_name) values(4,'us, public of');
    
    insert into valid_country(cid,country_name) values(5,'Evan');
    
    --select * from my_countries;
    
    --select * from valid_country;

    正确的结果是:

    WITH SPLIT_COUNTRY AS
    
    (
    
    SELECT
    
    RID,
    
    1 AS LVL,
    
    1 AS STARTPOS,
    
    CHARINDEX(',',COUNTRY_NAME_CC+',')-1 AS ENDPOS
    
    FROM MY_COUNTRIES
    
    UNION ALL
    
    SELECT
    
    SC.RID,
    
    LVL+1 AS LVL,
    
    ENDPOS+2,
    
    CHARINDEX(',',COUNTRY_NAME_CC+',',ENDPOS+2)-1
    
    FROM
    
    MY_COUNTRIES CC JOIN
    
    SPLIT_COUNTRY SC ON CC.RID=SC.RID
    
    WHERE CHARINDEX(',',CC.COUNTRY_NAME_CC+',',ENDPOS+2)>0
    
    )
    
    ,CTE_COUNTRY AS (
    
    SELECT RID,LVL,STARTPOS,ENDPOS,LEAD(ENDPOS,1) OVER(PARTITION BY RID ORDER BY LVL) AS NEXTENDPOS FROM SPLIT_COUNTRY
    
    )
    
    ,CTE AS (
    
    SELECT MC.RID,SC.LVL,
    
    CASE WHEN NEXTENDPOS IS NOT NULL AND EXISTS (SELECT * FROM VALID_COUNTRY VC WHERE VC.COUNTRY_NAME = SUBSTRING(COUNTRY_NAME_CC,STARTPOS,NEXTENDPOS-STARTPOS+1)) THEN
    
    SUBSTRING(COUNTRY_NAME_CC,STARTPOS,NEXTENDPOS-STARTPOS+1)
    
    ELSE
    
    SUBSTRING(MC.COUNTRY_NAME_CC,STARTPOS,ENDPOS-STARTPOS+1)
    
    END
    
    AS COUNTRY
    
    FROM MY_COUNTRIES MC JOIN CTE_COUNTRY SC
    
    ON MC.RID=SC.RID
    
    )
    
    ,CHECK_VALID AS (
    
    SELECT CASE WHEN CHARINDEX(',',LAG(COUNTRY,1) OVER(PARTITION BY RID ORDER BY LVL))>0 THEN 0 ELSE 1 END AS ISVALID,
    
    * FROM CTE
    
    )
    
    SELECT CV.RID,CV.COUNTRY,VC.CID FROM CHECK_VALID CV JOIN VALID_COUNTRY VC
    
    ON CV.COUNTRY = VC.COUNTRY_NAME
    
    AND ISVALID=1 ORDER BY RID;

     另一种方案,在第一种的基础上稍加修改:

    WITH SPLIT_COUNTRY AS
    
    (
    
    SELECT
    
    RID,
    
    1 AS LVL,
    
    1 AS STARTPOS,
    
    CHARINDEX(',',COUNTRY_NAME_CC+',')-1 AS ENDPOS
    
    FROM MY_COUNTRIES
    
    UNION ALL
    
    SELECT
    
    SC.RID,
    
    LVL+1 AS LVL,
    
    ENDPOS+2,
    
    CHARINDEX(',',COUNTRY_NAME_CC+',',ENDPOS+2)-1
    
    FROM
    
    MY_COUNTRIES CC JOIN
    
    SPLIT_COUNTRY SC ON CC.RID=SC.RID
    
    WHERE CHARINDEX(',',CC.COUNTRY_NAME_CC+',',ENDPOS+2)>0
    
    )
    
    ,CTE_COUNTRY AS (
    
    SELECT RID,LVL,STARTPOS,ENDPOS,LEAD(ENDPOS,1) OVER(PARTITION BY RID ORDER BY LVL) AS NEXTENDPOS FROM SPLIT_COUNTRY
    
    )
    
    ,CTE AS (
    
    SELECT MC.RID,SC.LVL,
    
    SUBSTRING(MC.COUNTRY_NAME_CC,STARTPOS,ENDPOS-STARTPOS+1) AS COUNTRY,
    
    SUBSTRING(COUNTRY_NAME_CC,STARTPOS,NEXTENDPOS-STARTPOS+1) AS COUNTRY2
    
    FROM MY_COUNTRIES MC JOIN CTE_COUNTRY SC
    
    ON MC.RID=SC.RID
    
    )
    
    SELECT CTE.RID,VC.COUNTRY_NAME,VC.CID
    
    FROM
    
    CTE JOIN VALID_COUNTRY VC
    
    ON (CASE WHEN EXISTS(SELECT * FROM VALID_COUNTRY X WHERE X.COUNTRY_NAME=CTE.COUNTRY2) THEN CTE.COUNTRY2
    
    ELSE CTE.COUNTRY END) = VC.COUNTRY_NAME
    
    ;

       

       

       

    Looking for a job working at Home about MSBI
  • 相关阅读:
    XML介绍
    JavaScript基础
    pygame模块参数汇总(python游戏编程)
    CSS3
    CSS( Cascading Style Sheets )简书
    HTML5
    用python进行应用程序自动化测试(uiautomation)
    HTML入门
    Selenium2 (python)
    C#快速入门
  • 原文地址:https://www.cnblogs.com/huaxiaoyao/p/4114761.html
Copyright © 2011-2022 走看看