zoukankan      html  css  js  c++  java
  • 『Knuth-Morris-Pratt Algorithm with its Automaton』

    Let's define (pi) function to descripe the longest common prefix and suffix ( also called "border" ) of substring (s_i) :

    [pi (s_i)=max_{s(1:t)=s(i-t+1:i)}{t} ]

    We appoint that (pi^{(t)}(s)) means (pi(s_{pi^{(t-1)}(s)})), and (pi^{(1)}(s)=pi(s)). Then define

    [eta = min t ext{ s.t. } s( pi^{(t)}(s_{i-1})+1)=s(i) ]

    Obviously, (pi(s_i)=pi^{(eta)}(s_{i-1})+1),so we can get (pi(s_{1sim |s|})) in (mathcal{O}(|s|)) time to solve the problem.

    Of course, if we can't find a positive integer (eta) satisfying the condition, let (pi(s_i)=0) : )

    #include <bits/stdc++.h>
    using namespace std;
    const int N = 1e6 + 20;
    int n, m, Fail[N]; char s[N], t[N];
    int main(void) {
    	scanf( "%s", s + 1 ), n = strlen( s + 1 );
    	scanf( "%s", t + 1 ), m = strlen( t + 1 );
    	for (int i = 2, j = 0; i <= m; i++) {
    		while ( j && t[j+1] != t[i] ) j = Fail[j];
    		j += t[j+1] == t[i], Fail[i] = j;
    	}
    	for (int i = 1, j = 0; i <= n; i++) {
    		while ( j && ( t[j+1] != s[i] || j == m ) ) j = Fail[j];
    		j += t[j+1] == s[i]; 
    		if ( j == m ) printf( "%d
    ", i - m + 1 );
    	}
    	for (int i = 1; i <= m; i++) 
    		printf( "%d%c", Fail[i], " 
    "[ i == m ] );
    	return 0;
    } 
    

    A deterministic finite automaton (mathcal{T}) called Knuth–Morris–Pratt Automaton when (mathcal{T}=(Q,alpha,delta,q_0,F)) refers to :

    • (|Q|=|s|+1,alpha = { exttt{a}, exttt{b}, exttt{c},cdots})
    • (delta(p,c)=pi(s_p+c))
    • (q_0=0,|F|=1,|s|in F,mathcal{T}(s)= ext{true})

    Actually, Knuth–Morris–Pratt Automaton is a simple automaton (mathcal{T}') which only accepts string (s) extended several transition accroding to our (pi) function. So we intend to construct the automaton by using the transition method of (pi(s_i)).

    Let's assume for a case where we have already constructed the automaton of (s_i),then we want to extend a character (c). How the algorithm works ? Firstly, we can get the length of the border of (s_i) ( (=pi(s_i)) ) . So it's not hard to find that the state (delta(pi(s_i),c)) is the exact state which we are finding. That is (delta(s_i,c)gets delta(pi(s_i),c)).

    Specially, let (delta(s_i,s(i+1))gets i+1) in order to recognize the whole string (s). To sum up, we construct the automaton in (mathcal O(|s| imes |alpha|)) time.

    inline void Build(char *s) {
        int len = strlen( s + 1 );
        trans[0][s[1]-'a'] = 1;
        for (int i = 1, j = 0; i < len; i++) {
            for (int k = 0; k < 26; k++) trans[i][k] = trans[j][k];
            trans[i][s[i+1]-'a'] = i+1, j = trans[j][s[i+1]-'a'];
        }
    }
    
  • 相关阅读:
    OC基础--常用类的初步介绍与简单使用之NSDate
    mysql创建自定义函数与存储过程
    Hibernate Session 获取connection
    mysql中isnull,ifnull,nullif的用法
    mysql数据库中某项其中一个值在该项排第几,百分几
    linux下的ImageMagick安装方法
    Java 判断操作系统类型(适用于各种操作系统)
    tesseract 字体训练资料篇
    structDemo1
    C语言中.h和.c文件解析(很精彩)
  • 原文地址:https://www.cnblogs.com/Parsnip/p/13985001.html
Copyright © 2011-2022 走看看