zoukankan      html  css  js  c++  java
  • KMP Demo

    The key of Kmp is to build a look up table that records the match result of prefix and postfix. Value in the table means the max len of matching substring that exists in both prefix and postfix. In the prefix this substring should start from 0, while in the postfix this substring should ends at current index.

    For example, now we have a string "ababc"

    The KMP table will look liks this:

     a  b  a  b  c
    -1  0  1  2  
    

    (Note: we will not match substring with itself, so we will skip index 0)

    So how does this table help us search string match faster?

    Well, the answer is if we are trying to match a char after posfix with target string and failed, then we can smartly shift the string, so that the matching string in perfix will replace postfix and now we can try to match the char after prefix with this char in target.

    Take above string as an example.

    Now we try to match string "ababc" with "abababc".

    We will initially have match as below

     0 1 2 3 4 5 6
     a b a b a b c (string x)
     a b a b c (string y)
    -1 0 1 2 

    We found char at index 4 does not match, then we can use look up table and shift the string y wisely. 

    We found table[3] = 2, which means we can shift the string y rightward by 2, and still have same but shorter prefix before index 4, like this:

     0 1 2 3 4 5 6
     a b a b a b c (string x)
         a b a b c (string y)
        -1 0 1 2 
    

    If there is a long gap between prefix and postfix, this shift can help us save a lot of time. In the brute force way, we cannot do that because we have no information of the string. We have to compare each possible pair of chars. While in KMP, we know the information of string y so we can move smartly. We can directly jump to the next possible match pairwhile discard useless pair of chars.

    We are almost done with KMP, but we still have one special case that needs to be taken care of.

    Say now we have a input like this:

     0 1 2 3 4 5
     a a b a a a
    -1 1 0 2 3 
    

    How should we build the KMP table for this string?

    Say the pointer in prefix is 'x', which is at index 2 now and the pointer in postfix is 'y' which is at index 5 now. We need to match 'b' pointed by x with 'a' pointed by y. It is an unmatched pair, how should we update the cell? 

    Well, we really don't need to reset it to 0, that will make us skip a valid shorter matching substring "aa".

    What we do now is just to shorten the length of substring by 1 unit and try to match a shorter substring "aa". This can be done by moving pointer x to the index recorded in [indexOf(x) - 1] while keep pointer y stay still. This is because by following the value in KMP table we can always make sure previous part of prefix and postfix is matched even we have shorten their length, so wo only need to care about the char after matched part in prefix and posfix.

    Code: [Java]

    // JAVA program for implementation of KMP pattern 
    // searching algorithm 
    
    class KMP_String_Matching { 
    	void KMPSearch(String pat, String txt) 
    	{ 
    		int M = pat.length(); 
    		int N = txt.length(); 
    
    		// create lps[] that will hold the longest 
    		// prefix suffix values for pattern 
    		int lps[] = new int[M]; 
    		int j = 0; // index for pat[] 
    
    		// Preprocess the pattern (calculate lps[] 
    		// array) 
    		computeLPSArray(pat, M, lps); 
    
    		int i = 0; // index for txt[] 
    		while (i < N) { 
    			if (pat.charAt(j) == txt.charAt(i)) { 
    				j++; 
    				i++; 
    			} 
    			if (j == M) { 
    				System.out.println("Found pattern "
    								+ "at index " + (i - j)); 
    				j = lps[j - 1]; 
    			} 
    
    			// mismatch after j matches 
    			else if (i < N && pat.charAt(j) != txt.charAt(i)) { 
    				// Do not match lps[0..lps[j-1]] characters, 
    				// they will match anyway 
    				if (j != 0) 
    					j = lps[j - 1]; 
    				else
    					i = i + 1; 
    			} 
    		} 
    	} 
    
    	void computeLPSArray(String pat, int M, int lps[]) 
    	{ 
    		// length of the previous longest prefix suffix 
    		int len = 0; 
    		int i = 1; 
    		lps[0] = 0; // lps[0] is always 0 
    
    		// the loop calculates lps[i] for i = 1 to M-1 
    		while (i < M) { 
    			if (pat.charAt(i) == pat.charAt(len)) { 
    				len++; 
    				lps[i] = len; 
    				i++; 
    			} 
    			else // (pat[i] != pat[len]) 
    			{ 
    				// This is tricky. Consider the example. 
    				// AAACAAAA and i = 7. The idea is similar 
    				// to search step. 
    				if (len != 0) { 
    					len = lps[len - 1]; 
    
    					// Also, note that we do not increment 
    					// i here 
    				} 
    				else // if (len == 0) 
    				{ 
    					lps[i] = len; 
    					i++; 
    				} 
    			} 
    		} 
    	} 
    
    	// Driver program to test above function 
    	public static void main(String args[]) 
    	{ 
    		String txt = "ABABDABACDABABCABAB"; 
    		String pat = "ABABCABAB"; 
    		new KMP_String_Matching().KMPSearch(pat, txt); 
    	} 
    } 
    // This code has been contributed by Amit Khandelwal. 
    

      

    Reference:

    https://www.geeksforgeeks.org/java-program-for-kmp-algorithm-for-pattern-searching-2/

    https://leetcode.com/problems/shortest-palindrome/discuss/60113/Clean-KMP-solution-with-super-detailed-explanation

    永远渴望,大智若愚(stay hungry, stay foolish)
  • 相关阅读:
    二分多重匹配(HDU5093)
    2-sat(and,or,xor)poj3678
    某个点到其他点的曼哈顿距离之和最小(HDU4311)
    第k最短路A*启发式搜索
    求树的直径和中心(ZOJ3820)
    并查集hdu4424
    map容器结构体离散化
    二维坐标系极角排序的应用(POJ1696)
    【进阶3-3期】深度广度解析 call 和 apply 原理、使用场景及实现(转)
    判断js数据类型的四种方法,以及各自的优缺点(转)
  • 原文地址:https://www.cnblogs.com/h-hkai/p/10711911.html
Copyright © 2011-2022 走看看