zoukankan      html  css  js  c++  java
  • Longest Common Substring

    Problem Statement

    Give two string $s_1$ and $s_2$, find the longest common substring (LCS). E.g: X = [111001], Y = [11011], the longest common substring is [110] with length 3.

    One terse way is to use Dynamic Programming (DP) to analyze the complex problem.

    Instead of dealing with irregular substring, we can first deal with substring indexed by last character.

    Define $dp[i][j] =$ the length of longest common substring of $s_1[0$~$i]$ and $s_2[0$~$j]$ ending with $s1[i]$ and $s2[j]$.

    Then, the maximum LCS length could be the maximum number in array $dp$.

    In order to get the value of $dp[i][j]$, we need to know if $s1[i]$ == $s2[j]$. If it is, then the $dp[i][j] = dp[i-1][j-1]+1$, else it'll be zero. Thus:

    dp[i][j] = (s1[i] == s2[j] ? (dp[i-1][j-1] + 1) : 0);
    

    As we want to know the concrete string with LCM, we just need to do a few modifications.

    When we get a larger $dp[i][j]$ than present maxLength, we'll update the maxLength by $dp[i][j]$.

    if(dp[i][j] > maxLen)
        maxLen = dp[i][j];
    

    At the same time, we can also record the starting index of the new longer substring. For string $s_1$, the beginning index of LCM is the present index $i$ adding 1 minus the length of LCM, i.e.

    if(dp[i][j] > maxLen){
        maxLen = dp[i][j];
        maxIndex = i + 1 - maxLen; 
    }
    

    Finally, we need to initialize state of $dp$. That's simple:

    for(int i = 0; i < s1.length(); ++i)
        dp[i][0] = (s1[i] == s2[0] ? 1 : 0);
    
    for(int j = 0; j < s2.length(); ++j)
        dp[0][j] = (s1[0] == s2[j] ? 1 : 0);
    

    The complete code is:

    void LCM(const string s1, const string s2, int &sIndex, int &length)
    {
        n1 = s1.length();
        n2 = s2.length();
        
        if(0 == n1 || 0 == n2) 
        {
            sIndex = -1;
            length = 0;
            return;
        }
        
        // initialize dp
        vector<vector<int> > dp;
        for(int i = 0; i < n1; ++i){
            vector<int> tmp;
            tmp.push_back((s1[i] == s2[0] ? 1 : 0));  // Initialize the bottom line
            for(int j = 1; j < n2; ++j)
            {
                if(0 == i){
                    tmp.push_back((s1[0] == s2[j] ? 1 : 0));  // Initialize the left line
                }else{
                    tmp.push_back(0);  // Empty the interior area
                }
            }
            
            dp.push_back(tmp);
        }
        
        // compute max length and index
        length = 0;
        for(int i = 1; i < n1; ++i){
            for(int j = 1; j < n2; ++j){
                if(st1[i] == st2[j])
                    dp[i][j] = dp[i-1][j-1] + 1;
                    
                if(dp[i][j] > length){
                    length = dp[i][j];
                    sIndex = i + 1 - length;
                }
            }
        }    
    }
    
  • 相关阅读:
    定义结构体
    UML建模需求分析常用的UML图
    UML建模EA模型的组织
    优化Python脚本替换VC2005/2008工程x64配置
    C++插件框架已在 Mac OS X 下编译通过
    《iPhone开发快速入门》交流提纲
    X3插件框架发布v1.1.3
    从零开始创建一个插件
    Google论坛什么时候又可以使用的
    我的第一个Py脚本:批量替换VC工程中的x64条件定义配置
  • 原文地址:https://www.cnblogs.com/kid551/p/4321392.html
Copyright © 2011-2022 走看看