zoukankan      html  css  js  c++  java
  • 后缀自动机学习笔记

    这个算法的大概流程,是这样的

    1.新建一个空结点的后缀自动机
    2.我们现在已经拥有了一个后缀自动机,我们把新的结点加进去,找前一个节点和它的父亲,如果没有指向新节点的字符的边,就建上这条边。
    3.如果我们向前找找到了空,那么这个东西的父亲就是根节点,它没有与它正好相等的后缀。
    4.如果找到了某个节点,如果这条字符边指向的点就是在它们后面的一个结点,那么就让这个新加入的节点的父亲连到这条字符边指向的地方。
    5.如果4的条件后半段不成立,那么新建一个结点k复制原来的结点,顺着父边找到每一个点p,如果p的这条字符边指向原来的节点,那么把这条字符边掰向新结点。
    6.这个结点的len是原来节点的父亲的len+1,cnt是0
    7.把新建的节点的父亲指向k,旧节点的父亲也改成k

    每个子串会在一些地方出现,他们的尾端点用父边从后向前链接起来
    根节点到之后结点的每条路径就是出现过的一个子串

    我们依赖于

    1.后缀只有N个

    2.每个子串重复出现的地方前边的如果和后面的有交集那么后面的串一定是前面的串的子集,这样就有一棵树
    例如
    BBABBAAAAAABBABBA
    BBA作为一个子串在前面出现过,它出现的位置是3,6,14,17
    从4开始到6结束的第二个BBA,它出现的位置6,14,17
    所以说后面的如果有交集就一定是前面的子集了(似乎很显然?)

    3.我们每次找前面的和这个后缀重复的前一个串会新建/找到的只有一个节点,所以空间也是O(n)的
    这个时候我们要找前面的串,变成了现在这个串的后缀
    例如
    AAB我们要插入一个B
    我们得到的几个新串是AABB,ABB,BB,B
    很显然,AAB所有的关系啊什么的已经被处理好了,我们要找的就是前面这些关系再加上一个B
    也就是说,AAB中能作为AAB后缀的串,这个串可以再加上一个B了
    然后我们发现AAB中作为AAB后缀的串(且自身不为AAB后缀)显然没有啊,其实最后一位的B的par指针是指向根节点的
    但是,AABB中有能作为AABB后缀的串!
    就是第三位的B啦!
    AABB
    第三位的B和可以作为AABB长度为1的后缀
    然而,你让我到哪里找这个节点啊……
    这个节点要满足从根节点走的步数是1,从根节点走一条B的边
    然而我们上次更新的时候根节点的那条B的边连着的是第三位的B啊
    那怎么办就直接把新加入的边的par边连到第三位B所在的节点吗
    显然不是啊,因为AAB也不是AABB的后缀啊
    然后我们新建一个这样的节点。由于它是不存在的,它的cnt是0,它的len是前一个节点的长度加1。
    第四位B的par边就可以连向它了,根节点的B边也连向它,它的父亲连向原来第三位的父亲(其实也是根节点),第三位的B和第四位的B父亲都连到它身上
    第四位的B是因为最靠近自己的那个后缀就是它
    第三位的B是因为这个点就是凭借自己产生的,虽然是我的父亲,但是它本质上还是因为我的存在而存在,所以我是我父亲存在的本源(好奇怪啊2333)
    也许会觉得这么不断新建新建节点会不会TLE+MLE啊
    当然不会啦因为你建的只有一个啦!!!因为这些关系是一个树哎,你可以想一想
    如果有了一个很长很长的串,这个串里面全是A
    AAAAAAAAAAAA.....AAAAAAA
    我们又新加入了一个A,去找离它最近的后缀,发现很巧哎,长度为n - 1就是新串的一个后缀了(n-1个A当然是n个A的后缀啦orz)
    那前面呢???1个A也是n个A的后缀,2个A也是n个A个后缀....n-2个A也是n个A个后缀啊
    都连一遍关系不就(n^2)
    然后就想到第二条的子集……发现,n - 2个A也是n - 1个A的后缀啊,n - 3个A也是n - 2个A个后缀啊....1个A也是2个A的后缀啊
    他们的关系就像一条链一样被串起来了
    然后如果字符集并不是只有A(废话当然不是只有A),这个关系就变成了一棵树啦
    所以每次新建的节点最多就是2个

    4.一点小补充
    关于边数呢???也很好理解啦,其实可以直接想点数只有O(N)字符集还是个常数所以也是O(N)啦,你写代码的时候开的也是2*26*N那么浪费啦 
    但是其实更少。
    我们每次是顺着后缀链去找东西的,它是一棵树,所以遍历它遍历到顶之后这条链也不复存在了……所以每次总共遍历的这个树不超过节点次数,每次就新建一条边,所以顺着后缀链加的边不超过2N(节点数是2*N个)
    然后新建的节点会保留这个原来的点连这的一条边,这个是多出来的,多出来的点也是N
    所以大概3×N的样子???
    边数被证明出来是3×N了,那么时间复杂度也是3×N的,可以说这个算法的时间复杂度已经取到下限了(废话因为读入一个字符串都是N的)
    【公子的死蠢讲解……】

    BZOJ 3998: [TJOI2015]弦论

    Time Limit: 10 Sec Memory Limit: 256 MB
    Submit: 3395 Solved: 1172
    [Submit][Status][Discuss]

    Description

    对于一个给定长度为N的字符串,求它的第K小子串是什么。

    Input

    第一行是一个仅由小写英文字母构成的字符串S
    第二行为两个整数T和K,T为0则表示不同位置的相同子串算作一个。T=1则表示不同位置的相同子串算作多个。K的意义如题所述。

    Output

    输出仅一行,为一个数字串,为第K小的子串。如果子串数目不足K个,则输出-1

    Sample Input

    aabc
    0 3

    Sample Output

    aab

    HINT

    N<=5*10^5
    T<2
    K<=10^9

    题解

    这道题建出一个后缀自动机,对于0操作,我们每个节点的出现次数设置成1,然后处理成前缀和
    对于1,每个节点出现的次数就是后面出现的尾链的和,从后往前累加就是每个节点出现次数,然后处理成前缀和

    代码

    #include <iostream>
    #include <cstdio>
    #include <algorithm>
    #include <cstring>
    #define MAXN 500010
    //#define ivorysi
    using namespace std;
    typedef long long ll;
    const int Z = 26;
    struct node {
        node *par,*trans[Z];
        int len;
        ll f,cnt;
    }pool[MAXN * 2],*tail = pool,*root,*last;
    node *que[MAXN * 2];
    int c[MAXN];
    char s[MAXN];
    ll K;
    int T,len,m;
    void Build_SAM(const int &e,const int &len) {
        node *nowpoi = tail++ , *p = last;
        nowpoi->len = len;
        nowpoi->cnt = 1;
        for( ; p && !p->trans[e] ; p = p->par) {
    	p->trans[e] = nowpoi;
        }
        if(!p) nowpoi->par = root;
        else {
    	node *q = p->trans[e];
    	if(p->len + 1 == q->len) nowpoi->par = q;
    	else {
    	    node *copyq = tail++;
    	    *copyq = *q;
    	    copyq->cnt = 0;copyq->len = p->len + 1;
    	    q->par = nowpoi->par = copyq;
    	    for(;p && p->trans[e] == q ; p = p->par) {
    		p->trans[e] = copyq;
    	    }
    	}
        }
        last = nowpoi;
        
    }
    void calc() {
        m = tail - pool;
        for(int i = 0 ; i < m ; ++i) c[pool[i].len]++;
        for(int i = 1 ; i <= len ; ++i) c[i] += c[i - 1];
        for(int i = 0 ; i < m ; ++i) que[--c[pool[i].len]] = &pool[i];
        if(T == 0) {
    	for(int i = 0 ; i < m ; ++i) que[i]->f = 1;
    	for(int i = m - 1; i >= 0 ; --i) {
    	    for(int j = 0 ; j < Z ; ++j) {
    		if(que[i]->trans[j]) {
    		    que[i]->f += que[i]->trans[j]->f;
    		}
    	    }
    	}
    	--root->f;
        }
        else {
    	for(int i = m - 1 ; i > 0 ; --i) que[i]->par->cnt += que[i]->cnt;
    	for(int i = 0 ; i < m ; ++i) que[i]->f = que[i]->cnt;
    	for(int i = m - 1; i >= 0 ; --i) {
    	    for(int j = 0 ; j < Z ; ++j) {
    		if(que[i]->trans[j]) {
    		    que[i]->f += que[i]->trans[j]->f;
    		}
    	    }
    	}
    	root->f -= root ->cnt;
        }
    }
    void solve() {
        if(root->f < K) {
    	puts("-1");
    	return;
        }
        node *p = root;
        while(K > 0) {
    	for(int i = 0 ; i < Z ; ++i) {
    	    if(p->trans[i]) {
    		if(K > p->trans[i]->f) K -= p->trans[i]->f;
    		else {
    		    p = p->trans[i];
    		    if(T == 0) --K;
    		    else K -= p->cnt;
    		    putchar('a' + i);
    		    break;
    		}
    	    }
    	}
        }
        putchar('
    ');
    }
    int main() {
    #ifdef ivorysi
        freopen("f1.in","r",stdin);
    #endif
        scanf("%s",s + 1);
        len = strlen(s + 1);
        root = last = tail++;
        for(int i = 1 ; i <= len ; ++i) {
    	Build_SAM(s[i] - 'a' , i);
        }
        scanf("%d%lld",&T,&K);
        calc();
        solve();
    }
    

    SPOJ NSUBSTR Substrings

    You are given a string S which consists of 250000 lowercase latin letters at most. We define F(x) as the maximal number of times that some string with length x appears in S. For example for string 'ababa' F(3) will be 2 because there is a string 'aba' that occurs twice. Your task is to output F(i) for every i so that 1<=i<=|S|.

    Input

    String S consists of at most 250000 lowercase latin letters.

    Output

    Output |S| lines. On the i-th line output F(i).

    Example

    Input:

    ababa

    Output:

    3
    2
    2
    1
    1

    题解

    把SAM构造出来,从后往前更新cnt值,再扫一遍找出最大的

    代码

    #include <iostream>
    #include <cstdio>
    #include <algorithm>
    #include <cstring>
    #define MAXN 500010
    //#define ivorysi
    using namespace std;
    typedef long long ll;
    const int Z = 26;
    struct node {
        node *par,*trans[Z];
        int len,cnt;
    }pool[MAXN * 2],*tail = pool,*last,*root;
    node *que[MAXN * 2];
    int c[MAXN * 2],ans[MAXN],len;
    char s[MAXN];
    void Build_SAM(const int &e,const int &len) {
        node *nowpoi = tail++,*p = last;
        nowpoi->len = len;nowpoi->cnt = 1;
        for( ; p && !p->trans[e] ; p = p->par) {
    	p->trans[e] = nowpoi;
        }
        if(!p) nowpoi->par = root;
        else {
    	node *q = p->trans[e];
    	if(p->len + 1 == q->len) nowpoi->par = q;
    	else {
    	    node * copyq = tail++;
    	    *copyq = *q;
    	    copyq->len = p->len + 1;copyq->cnt = 0;
    	    q->par = nowpoi->par = copyq;
    	    for( ; p && p->trans[e] == q; p = p->par) {
    		p->trans[e] = copyq;
    	    }
    	}
        }
        last = nowpoi;
    }
    void solve() {
        int m = tail - pool;
        for(int i = 0 ; i < m ; ++i) c[pool[i].len]++;
        for(int i = 1 ; i <= len ; ++i) c[i] += c[i - 1];
        for(int i = 0 ; i < m ; ++i) que[--c[pool[i].len]] = &pool[i];
        for(int i = m - 1 ; i > 0 ; --i) que[i]->par->cnt += que[i]->cnt;
        for(int i = 0 ; i < m ; ++i) ans[que[i]->len] = max(ans[que[i]->len],que[i]->cnt);
        for(int i = 1 ; i <= len ; ++i) printf("%d
    ",ans[i]);
    }
    int main() {
    #ifdef ivorysi
        freopen("f1.in","r",stdin);
    #endif
        scanf("%s",s + 1);
        len = strlen(s + 1);
        root = last = tail++;
        for(int i = 1 ; i <= len ; ++i) {
    	Build_SAM(s[i] - 'a' , i);
        }
        solve();
    }
    

    BZOJ 2555: SubString

    Time Limit: 30 Sec Memory Limit: 512 MB
    Submit: 3208 Solved: 958
    [Submit][Status][Discuss]

    Description

    懒得写背景了,给你一个字符串init,要求你支持两个操作
    (1):在当前字符串的后面插入一个字符串
    (2):询问字符串s在当前字符串中出现了几次?(作为连续子串)
    你必须在线支持这些操作。

    Input

    第一行一个数Q表示操作个数
    第二行一个字符串表示初始字符串init
    接下来Q行,每行2个字符串Type,Str
    Type是ADD的话表示在后面插入字符串。
    Type是QUERY的话表示询问某字符串在当前字符串中出现了几次。
    为了体现在线操作,你需要维护一个变量mask,初始值为0
    读入串Str之后,使用这个过程将之解码成真正询问的串TrueStr。
    询问的时候,对TrueStr询问后输出一行答案Result
    然后mask = mask xor Result
    插入的时候,将TrueStr插到当前字符串后面即可。
    HINT:ADD和QUERY操作的字符串都需要解压

    Output

    Sample Input

    2
    A
    QUERY B
    ADD BBABBBBAAB

    Sample Output

    0

    HINT

    40 % 的数据字符串最终长度 <= 20000,询问次数<= 1000,询问总长度<= 10000
    100 % 的数据字符串最终长度 <= 600000,询问次数<= 10000,询问总长度<= 3000000
    新加数据一组--2015.05.20

    题解

    这道题告诉我们会写数据很重要,后来缩小了字符集就能跑出来卡死我的数据了
    还是在线建一个SAM,然后用LCT维护parent树
    然而……我的维护写的好像很有问题……
    一开始我加边的东西都没搞对……后来终于搞对了……然后还是WA
    然后发现删除一个点的父亲和这个点的关系的时候,应该去掉的值是这个点子树的和,而不是这个点的cnt
    然后再把这个东西加进去,就AC了,跑了大概20s的样子???(好虚啊差点以为我T了QAQ)

    代码

    #include <iostream>
    #include <cstdio>
    #include <algorithm>
    #include <cstring>
    #define MAXN 600010
    //#define ivorysi
    using namespace std;
    typedef long long ll;
    const int Z = 26;
    struct node {
        node *par,*trans[Z];
        int len,cnt,id;
    }pool[MAXN * 4],*tail = pool,*last,*root;
    char s[3000005];
    int Q;
    struct lct {
        int lc,rc,fa,rev;
        int sum,add;
    #define lc(x) tr[x].lc
    #define rc(x) tr[x].rc
    #define add(x) tr[x].add
    #define fa(x) tr[x].fa
    #define rev(x) tr[x].rev
    #define sum(x) tr[x].sum
        void Reverse() {
    	swap(lc,rc);
    	rev ^= 1;
        }
        void Add(int c) {
    	sum += c;
    	add += c;
        }
    }tr[MAXN * 4];
    int que[MAXN * 4],tot;
    bool isRoot(int u) {
        if(fa(u) == 0) return 1;
        return lc(fa(u)) !=u && rc(fa(u)) != u;
    }
    void Push_down(int u) {
        if(rev(u)) {
    	if(lc(u)) tr[lc(u)].Reverse();
    	if(rc(u)) tr[rc(u)].Reverse();
    	rev(u) = 0;
        }
        if(add(u) != 0) {
    	if(lc(u)) tr[lc(u)].Add(add(u));
    	if(rc(u)) tr[rc(u)].Add(add(u));
    	add(u) = 0;
        }
    }
    int which(int u) {
        return rc(fa(u)) == u;
    }
    void Rotate(int u) {
        int v = fa(u) , w = fa(v);
        int b = u == lc(v) ? rc(u) : lc(u);
        if(w && !isRoot(v)) (lc(w) == v ? lc(w) : rc(w)) = u;
        fa(v) = u;fa(u) = w;
        if(b) fa(b) = v;
        if(u == lc(v)) {lc(v) = b;rc(u) = v;}
        else {rc(v) = b; lc(u) = v;}
    }
    void Splay(int u) {
        tot = 0;
        int x;
        for(x = u ; !isRoot(x) ; x = fa(x) ) {
    	que[++tot] = x;
        }
        que[++tot] = x;
        for(int i = tot ; i >= 1 ; --i) {
    	Push_down(que[i]);
        }
        while(!isRoot(u)) {
    	if(!isRoot(fa(u))) {
    	    if(which(fa(u)) == which(u)) Rotate(fa(u));
    	    else Rotate(u);
    	}
    	Rotate(u);
        }
    }
    void Access(int x) {
        for(int y = 0 ; x ; y = x , x = fa(x)) {
    	Splay(x);rc(x) = y;
    	if(y) fa(y) = x;
        }
    }
    void MakeRoot(int x) {
        Access(x);Splay(x);
        tr[x].Reverse();
    }
    void Link(int x,int y) {
        MakeRoot(x);fa(x) = y;
    }
    void Cut(int x,int y) {
        MakeRoot(x);Access(y);Splay(y);
        lc(y) = 0;fa(x) = 0;
    }
    int Query(int x) {
        Splay(x);
        return tr[x].sum;
    }
    void Add(int x,int c) {
        if(c == 0) return;
        MakeRoot(1);
        Access(x);
        Splay(x);
        tr[x].Add(c);
    }
    void Build_SAM(const int &e,const int &len) {
        node *nowpoi = tail++,*p = last;
        nowpoi->id = tail - pool;
        nowpoi->cnt = 1;nowpoi->len = len;
        for( ; p && !p->trans[e] ; p = p->par) {
    	p->trans[e] = nowpoi;
        }
        if(!p) {
    	nowpoi->par = root;
    	Link(1,nowpoi->id);
        }
        else {
    	node *q = p->trans[e];
    	if(p->len + 1 == q->len) {
    	    nowpoi->par = q;
    	    Link(nowpoi->id,q->id);
          	}
    	else {
    	    node *copyq = tail++;
    	    *copyq = *q;
    	    copyq->id = tail - pool;
    	    copyq->len = p->len + 1;copyq->cnt = 0;
    	    int s = Query(q->id);
    	    Add(q->id,-s);
    	    Cut(q->id,q->par->id);
    	    Link(copyq->id,q->par->id);
    	    q->par = nowpoi->par = copyq;
    	    Link(copyq->id,q->id);
    	    Link(copyq->id,nowpoi->id);
    	    Add(q->id,s);
    	    for( ; p && p->trans[e] == q ; p = p->par){
    		p->trans[e] = copyq;
    	    }
    	    
    	}
        }
        Add(nowpoi->id,1);
        last = nowpoi;
    }
    void decode(int len,int mask) {
        for(int i = 0 ; i < len ; ++i) {
    	mask = (mask * 131 + i) % len;
    	swap(s[i + 1],s[mask + 1]);
        }
    }
    int main() {
    #ifdef ivorysi
        freopen("f1.in","r",stdin);
    #endif
        scanf("%d",&Q);
        scanf("%s",s + 1);
        int len = strlen(s + 1);
        root = last = tail++;
        root->id = 1;
        for(int i = 1 ; i <= len ; ++i) {
    	Build_SAM(s[i] - 'A' , i);
        }
        int mask = 0;
        while(Q--) {
    	scanf("%s",s + 1);
    	if(s[1] == 'A') {
    	    scanf("%s",s + 1);
    	    int k = strlen(s + 1);
    	    decode(k,mask);
    	    for(int j = len + 1 ; j <= len + k ; ++j) {
    		Build_SAM(s[j - len] - 'A',j);
    	    }
    	    len += k;
    	}
    	else {
    	    scanf("%s",s + 1);
    	    node *p = root;
    	    int k = strlen(s + 1);
    	    decode(k,mask);
    	    int ans;
    	    for(int j = 1 ; j <= k ; ++j) {
    		if(p->trans[s[j] - 'A']) {
    		    p = p->trans[s[j] - 'A'];
    		}
    		else {
    		    ans = 0;
    		    goto fail;
    		}
    	    }
    	    ans = Query(p->id);
    	    fail:;
    	    printf("%d
    ",ans);
    	    mask ^= ans;
    	}
        }
    }
    /*
    2
    LZ
    ADD ZZLZZ
    QUERY ZZ
    */
    

    SPOJ LCS Longest Common Substring

    A string is finite sequence of characters over a non-empty finite set Σ.
    In this problem, Σ is the set of lowercase letters.
    Substring, also called factor, is a consecutive sequence of characters occurrences at least once in a string.
    Now your task is simple, for two given strings, find the length of the longest common substring of them.
    Here common substring means a substring of two or more strings.

    Input

    The input contains exactly two lines, each line consists of no more than 250000 lowercase letters, representing a string.

    Output

    The length of the longest common substring. If such string doesn't exist, print "0" instead.

    Example

    Input:

    alsdfkjfjkdsal
    fdjskalajfkdsla

    Output:

    3
    Notice: new testcases added

    题解

    把后缀自动机建出来,然后B串在上面跑,如果能往下走,就++len,如果不能就找到有这条边的的父亲节点,len修改成这个父亲的len+1,如果没有就回到根节点,len = 0

    代码

    #include <iostream>
    #include <cstdio>
    #include <algorithm>
    #include <cstring>
    #define MAXN 250005
    //#define ivorysi
    using namespace std;
    typedef long long ll;
    const int Z = 26;
    struct node {
        node *trans[Z],*par;
        int len,cnt;
    }pool[MAXN * 2],*tail  = pool,*root,*last;
    char sa[MAXN * 2],sb[MAXN * 2];
    void Build_SAM(const int &e,const int &len) {
        node *nowpoi = tail++ , *p = last;
        nowpoi->len = len;nowpoi->cnt = 1;
        for( ; p && !p->trans[e]; p = p->par) {
    	p->trans[e] = nowpoi;
        }
        if(!p) nowpoi->par = root;
        else {
    	node *q = p->trans[e];
    	if(p->len + 1 == q->len) nowpoi->par = q;
    	else {
    	    node *copyq = tail++;
    	    *copyq = *q;
    	    copyq->len = p->len + 1;
    	    copyq->cnt = 0;
    	    q->par = nowpoi->par = copyq;
    	    for( ;p && p->trans[e] == q ; p = p->par) {
    		p->trans[e] = copyq;
    	    }
    	}
        }
        last = nowpoi;
    }
    int main() {
    #ifdef ivorysi
        freopen("f1.in","r",stdin);
    #endif
        scanf("%s",sa + 1);
        scanf("%s",sb + 1);
        int lena = strlen(sa + 1),lenb = strlen(sb + 1);
        last = root = tail++;
        for(int i = 1 ; i <= lena ; ++i) {
    	Build_SAM(sa[i] - 'a',i);
        }
        int res = 0;
        int len = 0;
        node *p = root;
        for(int i = 1 ; i <= lenb ; ++i) {
    	int k = sb[i] - 'a';
    	if(p->trans[k]) {
    	    p = p->trans[k];
    	    len = len + 1;
    	}
    	else {
    	    for( ; p && !p->trans[k] ; p = p->par) ;
    	    if(!p) p = root , len = 0;
    	    else len = p->len + 1 , p = p->trans[k];
    	}
    	res = max(res,len);
        }
        printf("%d
    ",res);
    }
    

    SPOJ LCS2 Longest Common Substring II

    A string is finite sequence of characters over a non-empty finite set Σ.
    In this problem, Σ is the set of lowercase letters.
    Substring, also called factor, is a consecutive sequence of characters occurrences at least once in a string.
    Now your task is a bit harder, for some given strings, find the length of the longest common substring of them.
    Here common substring means a substring of two or more strings.

    Input

    The input contains at most 10 lines, each line consists of no more than 100000 lowercase letters, representing a string.

    Output

    The length of the longest common substring. If such string doesn't exist, print "0" instead.

    Example

    Input:

    alsdfkjfjkdsal
    fdjskalajfkdsla
    aaaajfaaaa

    Output:

    2

    题解

    还是建出一个串的SAM,每个状态维护一个值ans,初始赋成长度
    然后用后面的串跑,在每个状态记录val一下最大能匹配到哪
    然后用这个点的值更新父亲
    最后用每一次的val和ans取个min
    最后在ans里找到最大的就是答案

    代码

    #include <iostream>
    #include <cstdio>
    #include <algorithm>
    #include <cstring>
    #define MAXN 250005
    //#define ivorysi
    using namespace std;
    typedef long long ll;
    const int Z = 26;
    struct node {
        node *trans[Z],*par;
        int len,cnt,val,ans;
    }pool[MAXN * 2],*tail  = pool,*root,*last;
    node *que[MAXN * 2];
    char s[15][MAXN];
    int m,tot,c[MAXN];
    void Build_SAM(const int &e,const int &len) {
        node *nowpoi = tail++ , *p = last;
        nowpoi->len = len;nowpoi->cnt = 1;
        for( ; p && !p->trans[e]; p = p->par) {
    	p->trans[e] = nowpoi;
        }
        if(!p) nowpoi->par = root;
        else {
    	node *q = p->trans[e];
    	if(p->len + 1 == q->len) nowpoi->par = q;
    	else {
    	    node *copyq = tail++;
    	    *copyq = *q;
    	    copyq->len = p->len + 1;
    	    copyq->cnt = 0;
    	    q->par = nowpoi->par = copyq;
    	    for( ;p && p->trans[e] == q ; p = p->par) {
    		p->trans[e] = copyq;
    	    }
    	}
        }
        last = nowpoi;
    }
    int main() {
    #ifdef ivorysi
        freopen("f1.in","r",stdin);
    #endif
        int cnt = 0;
        while(scanf("%s",s[++cnt] + 1) != EOF);
        --cnt;
        int len = strlen(s[1] + 1);
        root = last = tail++;
        for(int i = 1 ; i <= len ; ++i) {
    	Build_SAM(s[1][i] - 'a' , i);
        }
        m = tail - pool;
        int res = 0;
        for(int i = 0 ; i < m ; ++i) {
    	pool[i].ans = pool[i].len;
    	c[pool[i].len]++;
        }
        for(int i = 1 ; i <= len ; ++i) c[i] += c[i - 1];
        for(int i = 0 ; i < m ; ++i) {
    	que[--c[pool[i].len]] = &pool[i];
        }
        for(int i = 2 ; i <= cnt ; ++i) {
    	int l = strlen(s[i] + 1);
    	
    	for(int j = 0 ; j < m ; ++j) {
    	    pool[j].val = 0;
    	}
    	node *p = root;
    	int len = 0;
    	for(int j = 1 ; j <= l ; ++j) {
    	    int z = s[i][j] - 'a';
    	    if(p->trans[z]) {
    		p = p->trans[z];
    		len = len + 1;
    		p->val = max(p->val,len);
    	    }
    	    else {
    		for( ; p && !p->trans[z] ; p = p->par);
    		if(!p) p = root , len = 0;
    		else len = p->len + 1 , p = p->trans[z];
    		p->val = max(p->val,len);
    	    }
    	}
    	for(int j = m - 1 ; j >= 1 ; --j) {
    	    que[j]->par->val = max(que[j]->par->val,que[j]->val);
    	}
    	for(int j = 0 ; j < m ; ++j) {
    	    pool[j].ans = min(pool[j].ans,pool[j].val);
    	}
        }
        for(int i = 0 ; i < m ; ++i) res = max(res,pool[i].ans);
        printf("%d
    ",res);
    }
    

    树上后缀自动机

    我们其实可以把后缀自动机推广到树上
    直接用父亲结点加上一个儿子节点
    但是这样正确嘛???是一个问题。
    可以这么想,每个节点到根节点的路径叫做它的路径字符串
    我们的par节点只会往根节点所在的路径字符串去找,然后找到的节点有可能和它是同深度的,但他们必然是完全相同的路径字符串,所以在哪都差不多
    否则这个点的par指的节点在它的祖先上,这个保证了这些par关系还是一棵树

    CodeChef TSUBSTR Substrings on a Tree

    Chefland scientists have made a new invention! They developed a new way to represent a string with N symbols: consider a tree with N vertices, rooted at the first vertice. For each vertice, a single latin letter is written. So we have obtained a "treestring". The scientists haven't decided yet how the treestring should be pronounced, but they have invented a definition of a substring for a treestring. A string is a substring of a treestring if and only it can be obtained by moving from some vertice to its descendant and writing out all the letters from vertices that occured on this path in the order they have appeared. For example, consider the following treestring :
    ![d7d395ed91980ffbb15f7ab2d92e62f9.png][1]
    The string "ba" is a substring of a given treestring because it can be obtained by moving from vertice 4 to vertice 6, the string "abb" is also a substring of this treestring - it can be obtained by moving from the root to vertice 5. However the string "cb" is not a substring of this treestring because there is no way from any vertice to its descendant in such a way that the sequence of letters is "cb".
    Now the Chefland researchers ask you to help them with the treestring research.
    They have given you a treestring with N vertices.
    Please output the number of distinct substrings of a given treestring (including the empty one).
    Then, Q queries will follow.
    For the i-th query, the permutation Pi of 26 latin alphabet letters and an integer Ki will be given.
    That means that if we sort all distinct substrings of the given treestring according to the alphabetical order described in Pi, you will have to output the Ki-th string.
    "According to the alphabetical order described in Pi" means that letter X is lexicographically smaller that letter Y if and only X appears
    in Pi earlier than Y. For example if the alphabetical order is "cbadefghijklmnopqrstuvwxyz", then letter "c" is lexicographically smaller than letter "a" because "c" is the first symbol of this permutation, and "a" is the third symbol of this permutation, therefore 1<3 and for the given arrangement, "c" is alphabetically less than "a".
    Here note that the string A is smaller than the string B (that means A comes earlier than B
    after sorting) if and only if
    A is a prefix of B,
    or Ai = Bi (for all i < k) and Ak < Bk (in terms of alphabetical order)
    where Ai denotes the i-th letter of A.

    Constraints

    1<=N<=250000
    1<=Q<=50000
    1<=Ki<=9223372036854775807 (2^63-1)
    Output will not exceed 800 KB.
    It is guaranteed that the N lowercase latin letters have been generated randomly.

    Input

    The first line of input consists of two integers - N and Q.
    Then, a string composed of N lowercase latin letters follow.
    Then, N-1 lines follow. Each line is composed of two numbers - Xi and Yi. It means that there is an edge between vertice Xi and vertice Yi.
    Then, Q lines follow. Each line consists of a permutation of 26 lowercase latin letters Pi and an integer Ki.

    Output

    Output Q+1 lines. On the first line output a single integer - the number of distinct substrings of a given treestring. The following Q lines should contain answers to the queries. I-th line should contain an answer to i-th query or a string "-1" if it is impossible
    to find Ki-th string for i-th query.

    Example

    Input:

    8 4
    abcbbaca
    1 2
    2 3
    1 4
    4 5
    4 6
    4 7
    1 8
    abcdefghijklmnopqrstuvwxyz 5
    abcdefghijklmnopqrstuvwxyz 1
    bcadefghijklmnopqrstuvwxyz 5
    abcdefghijklmnopqrstuvwxyz 100

    Output:

    12
    aba

    ba
    -1

    题解

    注意空行是第一个字符串
    建出树上后缀自动机,和弦论的统计方法差不多

    代码

    #include <iostream>
    #include <cstdio>
    #include <algorithm>
    #include <cstring>
    #define MAXN 500005
    //#define ivorysi
    using namespace std;
    typedef long long ll;
    struct node {
    	int to,next;
    }edge[MAXN * 2];
    int head[MAXN],sumedge;
    void add(int u,int v) {
    	edge[++sumedge].to = v;
    	edge[sumedge].next = head[u];
    	head[u] = sumedge;
    }
    void addtwo(int u,int v) {
    	add(u,v);
    	add(v,u);
    }
    struct sam {
    	sam *par,*trans[26];
    	int len;
    	ll f;
    }pool[MAXN * 2],*tail = pool ,*root,*que[MAXN * 2];
    int n,Q;
    ll K;
    int ql,qr,q[MAXN * 2],fa[MAXN * 2],pos[MAXN * 2],c[MAXN * 2];
    char str[MAXN * 2],alpha[30];
    void build() {
    	q[ql = qr = 1] = 1;
    	while(ql <= qr) {
    		int now = q[ql++];
    		for(int i = head[now] ; i ; i = edge[i].next) {
    			if(edge[i].to != fa[now]) {
    				q[++qr] = edge[i].to;
    				fa[q[qr]] = now;
    			}
    		}
    		sam *p = &pool[pos[fa[now]]];
    		sam *nowpoi = tail++;
    		nowpoi->len = p->len + 1;
    		pos[now] = tail - pool - 1;
    		int e = str[now] - 'a';
    		for( ; p && !p->trans[e] ; p = p->par) {
    			p->trans[e] = nowpoi;
    		}
    		if(!p) nowpoi->par = root;
    		else {
    			sam *q = p->trans[e];
    			if(q->len == p->len + 1) nowpoi->par = q;
    			else {
    				sam *copyq = tail++;
    				*copyq = *q;
    				copyq->len = p->len + 1;
    				q->par = nowpoi->par = copyq;
    				for( ; p && p->trans[e] == q ; p = p->par) {
    					p->trans[e] = copyq;
    				}
    			}
    		}
    	}
    }
    int main() {
    #ifdef ivorysi
        freopen("f1.in","r",stdin);
    #endif
        scanf("%d%d",&n,&Q);
        scanf("%s",str + 1);
        int u,v;
        for(int i = 1 ; i < n ; ++i) {
        	scanf("%d%d",&u,&v);
        	addtwo(u,v);
        }
        root = tail++;
        build();
        int m = tail - pool;
        for(int i = 0 ; i < m ; ++i) ++c[pool[i].len];
       	for(int i = 1 ; i <= n ; ++i) c[i] += c[i - 1];
       	for(int i = 0 ; i < m ; ++i) que[--c[pool[i].len]] = &pool[i];
       	for(int i = 0 ; i < m ; ++i) que[i]->f = 1;
       	for(int i = m - 1 ; i >= 0 ; --i) {
       		for(int j = 0 ; j < 26 ; ++j) {
       			if(que[i]->trans[j])
       				que[i]->f += que[i]->trans[j]->f;
       		}
       	}
       	
       	printf("%lld
    ",root->f);
       	--root->f;
       	while(Q--) {
       		scanf("%s",alpha);
       		scanf("%lld",&K);
       		sam *p = root;
       		--K;
       		if(p->f < K) {
       			puts("-1");
       			continue;
       		}
       		while(K > 0) {
       			for(int i = 0 ; i < 26 ; ++i) {
       				if(p->trans[alpha[i]-'a']) {
    	   				if(K > p->trans[alpha[i]-'a']->f) 
    	   					K -= p->trans[alpha[i]-'a']->f;
    	   				else {
    	   					p = p->trans[alpha[i] - 'a'];
    	   					--K;
    	   					putchar(alpha[i]);
    	   					break;
    	   				}
       				}
       			}
       		}
       		putchar('
    ');
       	}
    }
    
  • 相关阅读:
    中国象棋评估函数建模
    C语言中指针变量传参
    STM32外部中断
    C语言中的注释
    STM32学习网址
    C语言中的布尔值
    更改KEIL背景配色
    Modbus通讯协议
    DUP
    算法的时间复杂度
  • 原文地址:https://www.cnblogs.com/ivorysi/p/9058130.html
Copyright © 2011-2022 走看看