zoukankan html css js c++ java

后缀数组学习与练习(入门向)

Zbq

算法概述：

首先后缀数组SA是后缀数组是后缀树的一个非常精巧的替代品，它比后缀树容易编程实现，能够实现后缀树的很多功能而时间复杂度也不太逊色，并且，它比后缀树所占用的空间小很多。可以说，在信息学竞赛中后缀数组比后缀树要更为实用。

在这之上主要学习了倍增算法来求后缀数组，下面列举几个在学习中发现的困惑点。

tmp数组储存的是第二关键字从小到大排序后的编码位置而不是编码，这点使我好久才弄明白代码的意义。
基数排序的最后一步反向来，保证在满足第一关键字的情况下，第二关键字的有序。
每次求出sa数组反向带入rank时，要有一个类似离散化的操作，老是把起点2写成1
求height数组是排名第i和I – 1的lcp，所以要还原sa进行比较

常见模型：

1.可重叠的最长重复子串问题

方法:求所有height的最大值就是答案

2.求不可重叠的最长重复子串问题

方法:首先二分length，转换成判定性问题，之后我们将所有height大于等于的划分成若干个段段，这样在每个段段里，两两之间都是lcp大于等于k的，如果其中有两个元素的sa值绝对值大于等于k那么就符合条件

3.可重叠的k次最长重复子串

方法:类似于上一个，我们要让只要有一个划分出来的段段的长度大于等于k – 1

4.求重复出现的子串数目

方法:可以考虑，每一个height值就能贡献当前值的答案，但是这样算下来，会有的子串算重复，考虑当前的和排名靠前一个的，如果当前的height大于前height的值，那么就多加上了前一个的height值，减去即可。

5.不相同子串计数问题

方法:和博客上写的不一样，这个题目我们可以先求出所有子串的个数，之后减去所有的height值，原因是每个子串必然是某个后缀的前缀。令S的长度为N，、则后缀SA[i]可以贡献出N-SA[i]个前缀。但其中有Height[i]个与之前的是重复的，减去之后正好能够得到答案。

6.字典序第k小字符串问题

方法:同上一个，我们在对不相同子串进行计数的时候，子串的字典序值是递增的，这样算到分界点然后二分或者枚举应该都可以

7.连续重复子串问题

方法:首先枚举子串的长度k,之后如果符合题目要求的话，suffix(1)和suffix(k + 1)的lcp应当是k + 1，这个就用些操作随便处理了，可以用st表，也可以考虑两个后缀的sa必然相邻。

8.多个字符串的相关问题

方法:在本来的基础上，将字符串头尾相连，中间添加一个极大的字符，就可以进行一些常规操作了.

由于笔者蒟蒻暂时就整理这些了

后缀数组练习 都是些板子题

1. P3809 【模板】后缀排序求sa数组。。只是打个板子

2. [JSOI2007]字符加密Cipher 拆环为链求sa数组

3. JZOJ 1598. 文件修复求重复出现的子串数目

4. spoj694 Distinct Substrings 统计本质不同的子串个数

5. poj1743:Musical Theme 差分，二分答案后缀数组

6. [USACO06DEC]牛奶模式Milk Patterns 后缀数组单调队列

7. [AHOI2013]差异后缀数组单调栈

8. [HAOI2016]找相同字符后缀数组单调栈

题目和解析

1. P3809 【模板】后缀排序
题目背景
这是一道模板题。
题目描述
读入一个长度为 nn 的由大小写英文字母或数字组成的字符串，请把这个字符串的所有非空后缀按字典序从小到大排序，然后按顺序输出后缀的第一个字符在原串中的位置。位置编号为 11 到 nn 。
输入输出格式
输入格式：

一行一个长度为 nn 的仅包含大小写英文字母或数字的字符串。
输出格式：

一行，共n个整数，表示答案。
方法：裸的求sa啊 -。-。-。-
代码:

#include<cstdio>
#include<algorithm>
#include<cstring>
#define M 1000010
using namespace std;
char s[M];
int tex[M], num[M], rank[M], sa[M], tmp[M], n, m, q;

void qsort()
{
    for(int i = 0; i <= m; i++) tex[i] = 0;
    for(int i = 1; i <= n; i++) tex[rank[tmp[i]]]++;
    for(int i = 1; i <= m; i++) tex[i] += tex[i - 1];
    for(int i = n; i >= 1; i--) sa[tex[rank[tmp[i]]]--] = tmp[i];
}

bool check(int l, int r, int wei)
{
    return tmp[l] == tmp[r] && tmp[l + wei] == tmp[r + wei];
}

void suffix()
{
    for(int i = 1; i <= n; i++) rank[i] = num[i], tmp[i] = i;
    m = 127; qsort();
    for(int w = 1; q < n; w <<= 1, m = q)
    {
        q = 0;
        for(int i = n - w + 1; i <= n; i++) tmp[++q] = i;
        for(int i = 1; i <= n; i++) if(sa[i] > w) tmp[++q] = sa[i] - w;
        qsort();
        swap(rank, tmp);
        q = rank[sa[1]] = 1;
        for(int i = 2; i <= n; i++) if(check(sa[i], sa[i - 1], w)) rank[sa[i]] = q; else rank[sa[i]] = ++q;
    }
}

int main()
{
    scanf("%s", s);
    n = strlen(s);
    for(int i = 1; i <= n; i++) num[i] = s[i - 1];
    suffix();
    for(int i = 1; i <= n; i++) printf("%d ", sa[i]);
    return 0;
}

View Code

***********************************************************************************

2. [JSOI2007]字符加密Cipher

Time Limit: 10 Sec Memory Limit: 162 MB
Submit: 4175 Solved: 1694
[Submit][Status][Discuss]
Description

喜欢钻研问题的JS 同学，最近又迷上了对加密方法的思考。一天，他突然想出了一种他认为是终极的加密办法：把需要加密的信息排成一圈，显然，它们有很多种不同的读法。例如下图，可以读作：

JSOI07 SOI07J OI07JS I07JSO 07JSOI 7JSOI0 把它们按照字符串的大小排序： 07JSOI 7JSOI0 I07JSO JSOI07 OI07JS SOI07J 读出最后一列字符：I0O7SJ，就是加密后的字符串（其实这个加密手段实在很容易破解，鉴于这是突然想出来的，那就^^）。但是，如果想加密的字符串实在太长，你能写一个程序完成这个任务吗？

Input

输入文件包含一行，欲加密的字符串。注意字符串的内容不一定是字母、数字，也可以是符号等。

Output

输出一行，为加密后的字符串。

方法:

安环状来读可以考虑我们以前用的一个算法叫拆环为链之后要我们求的是sa数组注意输出的时候要考虑位置只能是前n个

代码:

#include<cstdio>

#include<algorithm>

#include<cstring>

#include<iostream>

#define M 200020

using namespace std;

int tex[M], num[M], rank[M], sa[M], tmp[M], n, m, q;

char s[M];

 

void qsort() {

    for(int i = 0; i <= m; i++) tex[i] = 0;

    for(int i = 1; i <= n; i++) tex[rank[tmp[i]]]++;

    for(int i = 1; i <= m; i++) tex[i] += tex[i - 1];

    for(int i = n; i >= 1; i--) sa[tex[rank[tmp[i]]]--] = tmp[i];

}

 

bool cmp(int l, int r, int wei) {

    return tmp[l] == tmp[r] && tmp[l + wei] == tmp[r + wei];

}

 

void suffix() {

    for(int i = 1; i <= n; i++) rank[i] = num[i], tmp[i] = i;

    m = 127;

    qsort();

    for(int w = 1; q < n; w <<= 1, m = q) {

        q = 0;

        for(int i = n - w + 1; i <= n; i++) tmp[++q] = i;

        for(int i = 1; i <= n; i++) if(sa[i] > w) tmp[++q] = sa[i] - w;

        qsort();

        swap(tmp, rank);

        rank[sa[1]] = q = 1;

        for(int i = 2; i <= n; i++) if(cmp(sa[i], sa[i - 1], w)) rank[sa[i]] = q;

            else rank[sa[i]] = ++q;

    }

}

 

int main() {

    scanf("%s", s);

    n = strlen(s);

    for(int i = 1; i <= n; i++) num[i] = num[i + n] = s[i - 1], s[i + n - 1] = s[i - 1];

    n += n;

    suffix();

    for(int i = 1; i <= n; i++) {

        if(sa[i] <= (n >> 1)) putchar(s[sa[i] + (n >> 1) - 2]);

    }

    return 0;

}

View Code

***********************************************************************************

3. JZOJ 1598. 文件修复

Description

　　有一个文件被破坏了，可是值得庆幸的是，只是文件的顺序被打乱了。文件仅包含大小写的拉丁字母以及逗号，句号和叹号。为了尽快修复，请你找出有多少个至少出现两次的子串。
　　比如字符串abbabc，子串”a”，”b”,”ab”分别出现了2次，3次，2次。

Input

　　输入文件第一行包含一个整数n表示文件的长度。
　　第二行n个字符，表示被破坏的文件。

Output

　　输出一个数，表示有多少个至少出现两次的子串。

Sample Input

aabbabb

Sample Output

Data Constraint

Hint

【数据约束和评分方法】

　　对于10%的数据，1<=n<=100。
　　对于30%的数据，1<=n<=1000。
　　对于50%的数据，1<=n<=10000。　　
　　对于100%的数据，1<=n<=100000。

方法:上面模型4的裸题，但是由于我没找到提交的oj，写了就没提交

spoj694 Distinct Substrings

Description

Given a string, we need to find the total number of its distinct substrings.

Input

T- number of test cases. T<=20;
Each test case consists of one string, whose length is <= 1000

Output

For each test case output one number saying the number of distinct substrings.

Example

Sample Input:
2
CCCCC
ABABA

Sample Output:
5
9

Explanation for the testcase with string ABABA:
len=1 : A,B
len=2 : AB,BA
len=3 : ABA,BAB
len=4 : ABAB,BABA
len=5 : ABABA
Thus, total number of distinct substrings is 9.

统计本质不同的子串个数

方法:

求不同子串个数我们可以考虑求出所有子串的个数然后减去所有height的值即是答案

求出SA数组与Height数组，每个子串必然是某个后缀的前缀。令S的长度为N，、

则后缀SA[i]可以贡献出N-SA[i]个前缀。但其中有Height[i]个与之前的是重复的，因此要减去。

代码:

#include<cstdio>

#include<algorithm>

#include<cstring>

#include<iostream>

#define M   10010

using namespace std;

 

int tex[M], num[M], rank[M], sa[M], tmp[M], height[M], n, m, q, ans;

char s[M];

 

void qsort() {

    for(int i = 0; i <= m; i++) tex[i] = 0;

    for(int i = 1; i <= n; i++) tex[rank[tmp[i]]]++;

    for(int i = 1; i <= m; i++) tex[i] += tex[i - 1];

    for(int i = n; i >= 1; i--) sa[tex[rank[tmp[i]]]--] = tmp[i];

}

 

bool cmp(int l, int r, int we) {

    return tmp[l] == tmp[r] && tmp[l + we] == tmp[r + we];

}

 

void suffix() {

    for(int i = 1; i <= n; i++) rank[i] = num[i], tmp[i] = i;

    m = 127;

    qsort();

    for(int w = 1; q < n; w <<= 1, m = q) {

        q = 0;

        for(int i = n - w + 1; i <= n; i++) tmp[++q] = i;

        for(int i = 1; i <= n; i++) if(sa[i] > w) tmp[++q] = sa[i] - w;

        qsort();

        for(int i = 1; i <= n; i++) swap(rank[i], tmp[i]);

        rank[sa[1]] = q = 1;

        for(int i = 2; i <= n; i++) if(cmp(sa[i], sa[i - 1], w)) rank[sa[i]] = q;

            else rank[sa[i]] = ++ q;

    }

    int j, k = 0;

    for(int i = 1; i <= n; height[rank[i++]] = k)

        for(k = k ? k - 1 : k, j = sa[rank[i] - 1]; num[i + k] == num[j + k]; k++);

}

 

 

int main() {

    int T;

    cin >> T;

    while(T--) {

        scanf("%s", s);

        n = strlen(s);

        q = ans = 0;

        for(int i = 1; i <= n; i++) ans += i, num[i] = s[i - 1];

        suffix();

        for(int i = 1; i <= n; i++) ans -= height[i];

        cout << ans << "
";

    }

    return 0;

}

View Code

***********************************************************************************

4.mesical melody

Description

A musical melody is represented as a sequence of N (1<=N<=20000)notes that are integers in the range 1..88, each representing a key on the piano. It is unfortunate but true that this representation of melodies ignores the notion of musical timing; but, this programming task is about notes and not timings.
Many composers structure their music around a repeating &qout;theme&qout;, which, being a subsequence of an entire melody, is a sequence of integers in our representation. A subsequence of a melody is a theme if it:

is at least five notes long

appears (potentially transposed -- see below) again somewhere else in the piece of music

is disjoint from (i.e., non-overlapping with) at least one of its other appearance(s)

Transposed means that a constant positive or negative value is added to every note value in the theme subsequence.
Given a melody, compute the length (number of notes) of the longest theme.
One second time limit for this problem's solutions!

Input

The input contains several test cases. The first line of each test case contains the integer N. The following n integers represent the sequence of notes.
The last test case is followed by one zero.

Output

For each test case, the output file should contain a single line with a single integer that represents the length of the longest theme. If there are no themes, output 0.

Sample Input

25 27 30 34 39 45 52 60 69 79 69 60 52 45 39 34 30 26 22 18

82 78 74 70 66 67 64 60 65 80

题意 :

有N个音符的序列，每个音符都是1- 88范围内的整数，现在要找一个重复的主题。“主题”是整个音符序列的一个子串，它需要满足如下条件：

1.长度至少为5个音符。

2.在乐曲中重复出现。(可能经过转调，“转调”的意思是主题序列中每个音符都被加上或减去了同一个整数值)

3.重复出现的同一主题不能有公共部分。

方法:首先差分得到差分数组，将问题转化成找到一个长度大于等于4的不重叠的子串问题，之后就是经典模型了,

二分加上后缀数组

首先考虑两个子串的每位差相等相当于我们先求出一个差分数组之后求相等的长度大于等于4的子串

这样可以套用后缀数组首先构建height数组

之后发现答案具有单调性可以采用二分答案符合要求的答案必须做到两个子串的开头距离大于等于k，（但是为啥我的代码不对呢QAQ）

代码:(伪)

#include<cstdio>

#include<algorithm>

#include<cstring>

#include<iostream>

#define M 2001000

using namespace std;

 

 

int note[M], num[M], tex[M], tmp[M], rank[M], sa[M], height[M], n, m, q;

int read() {

    int nm = 0, f = 1;

    char c = getchar();

    for(; !isdigit(c); c = getchar()) if(c == '-') f = -1;

    for(; isdigit(c); c = getchar()) nm = nm * 10 + c - '0';

    return nm * f;

}

 

void qsort() {

    for(int i = 0; i <= m; i++) tex[i] = 0;

    for(int i = 1; i <= n; i++) tex[rank[tmp[i]]]++;

    for(int i = 1; i <= m; i++) tex[i] += tex[i - 1];

    for(int i = n; i >= 1; i--) sa[tex[rank[tmp[i]]]--] = tmp[i];

}

 

bool cmp(int l, int r, int we) {

    return tmp[l] == tmp[r] && tmp[l + we] == tmp[r + we];

}

 

void suffix() {

    for(int i = 1; i <= n; i++) rank[i] = num[i], tmp[i] = i;

    m = 200;

    qsort();

    for(int w = 1; q < n; w += w, m = q) {

        q = 0;

        for(int i = n - w + 1; i <= n; i++) tmp[++q] = i;

        for(int i = 1; i <= n; i++) if(sa[i] > w) tmp[++q] = sa[i] - w;

        qsort();

        swap(rank, tmp);

        rank[sa[1]] = q = 1;

        for(int i = 2; i <= n; i++) if (cmp(sa[i], sa[i - 1], w)) rank[sa[i]] = q;

            else rank[sa[i]] = ++q;

    }

    int j, k = 0;

    for(int i = 1; i <= n; height[rank[i++]] = k)

        for(k = k ? k - 1 : k, j = sa[rank[i] - 1]; num[i + k] == num[j + k]; k++);

}

 

bool check(int k) {

    int maxx,minn;

    maxx = minn = sa[1]; //

    for(int i = 2; i <= n + 1; i++) {

        if(height[i] >= k && i <= n) {

            minn = min(minn, sa[i]);

            maxx = max(maxx, sa[i]);

            continue;

        }

        if(maxx - minn >= k) return true;

        maxx = minn = sa[i];

    }

    return false;

}

 

 

 

int main() {

    while(n = read() && n) {

        for(int i = 0; i < n; i++) note[i] = read();

        for(int i = 1; i < n; i++) num[i] = note[i] - note[i - 1] + 100;

        n--;

        suffix();

        int l = 4, r = n;

        //for(int i = 0; i <= n; i++) printf("%d ", height[i]);

        while(l + 1 < r) {

            int mid = (l + r) >> 1;

            if(check(mid)) l = mid;

            else r = mid;

        }

        if(!check(l)) {

            puts("0");

        } else if(check(r)) printf("%d", r + 1);

        else printf("%d
", l + 1);

    }

    return 0;

}

View Code

***********************************************************************************

5.[USACO06DEC]牛奶模式Milk Patterns

题目描述

Farmer John has noticed that the quality of milk given by his cows varies from day to day. On further investigation, he discovered that although he can't predict the quality of milk from one day to the next, there are some regular patterns in the daily milk quality.

To perform a rigorous study, he has invented a complex classification scheme by which each milk sample is recorded as an integer between 0 and 1,000,000 inclusive, and has recorded data from a single cow over N (1 ≤ N ≤ 20,000) days. He wishes to find the longest pattern of samples which repeats identically at least K (2 ≤ K ≤ N) times. This may include overlapping patterns -- 1 2 3 2 3 2 3 1 repeats 2 3 2 3 twice, for example.

Help Farmer John by finding the longest repeating subsequence in the sequence of samples. It is guaranteed that at least one subsequence is repeated at least K times.

农夫John发现他的奶牛产奶的质量一直在变动。经过细致的调查，他发现：虽然他不能预见明天产奶的质量，但连续的若干天的质量有很多重叠。我们称之为一个“模式”。 John的牛奶按质量可以被赋予一个0到1000000之间的数。并且John记录了N(1<=N<=20000)天的牛奶质量值。他想知道最长的出现了至少K(2<=K<=N)次的模式的长度。比如1 2 3 2 3 2 3 1 中 2 3 2 3出现了两次。当K=2时，这个长度为4。

输入输出格式

输入格式：

Line 1: Two space-separated integers: N and K

Lines 2..N+1: N integers, one per line, the quality of the milk on day i appears on the ith line.

输出格式：

Line 1: One integer, the length of the longest pattern which occurs at least K times

方法：

求出k个连续的最大前缀可以考虑使用height数组找出连续的k个然后里面的height最小值就是这一整段的最大公共前缀由于我们是用枚举直接扫的所以就可以方便的使用单调队列来维护了

代码:

#include<cstdio>

#include<algorithm>

#include<cstring>

#include<iostream>

#define M 40010

using namespace std;

 

int read() {

    int nm = 0, f = 1;

    char c = getchar();

    for(; !isdigit(c); c = getchar()) if(c == '-') f = -1;

    for(; isdigit(c); c = getchar()) nm = nm * 10 + c - '0';

    return nm * f;

}

int tex[M * 30], num[M], rank[M], height[M], sa[M], tmp[M], n, k, l, r, que[M], note[M], ans, q, m;

 

void qsort() {

    for(int i = 0; i <= m; i++) tex[i] = 0;

    for(int i = 1; i <= n; i++) tex[rank[tmp[i]]]++;

    for(int i = 1; i <= m; i++) tex[i] += tex[i - 1];

    for(int i = n; i >= 1; i--) sa[tex[rank[tmp[i]]]--] = tmp[i];

}

 

bool cmp(int l, int r, int wei) {

    return tmp[l] == tmp[r] && tmp[l + wei] == tmp[r + wei];

}

 

void suffix() {

    for(int i = 1; i <= n; i++) rank[i] = num[i], tmp[i] = i;

    m = 1000000;

    qsort();

    for(int w = 1; q < n; w <<= 1, m = q) {

        q = 0;

        for(int i = n - w + 1; i <= n; i++) tmp[++q] = i;

        for(int i = 1; i <= n; i++) if(sa[i] > w) tmp[++q] = sa[i] - w;

        qsort();

        swap(tmp, rank);

        rank[sa[1]] = q = 1;

        for(int i = 2; i <= n; i++) if(cmp(sa[i], sa[i - 1], w)) rank[sa[i]] = q;

            else rank[sa[i]] = ++q;

    }

    int kk = 0, j;

    for(int i = 1; i <= n; height[rank[i++]] = kk)

        for(kk = kk ? kk - 1 : kk, j = sa[rank[i] - 1]; num[i + kk] == num[j + kk]; kk++);

}

 

void push(int pla, int ver)

{

    while(note[l] + k - 1 <= pla && l <= r)l++;

    while(que[r] > ver && l <= r) r--;

    r++; que[r] = ver; note[r] = pla;

}

 

void solve() {

    l = 1, r = 0;

    for(int i = 1; i < k; i++) push(i, height[i]);

    for(int i = k; i <= n; i++) {

        push(i, height[i]);

        ans = max(ans, que[l]);

    }

}

 

int main() {

    n = read();

    k = read();

    for(int i = 1; i <= n; i++) num[i] = read();

    suffix();

    solve();

    cout << ans << "
";

}

View Code

***********************************************************************************

[AHOI2013]差异

题目描述

给定一个长度为 nn 的字符串 SS ，令 T_iTi 表示它从第 ii 个字符开始的后缀。求

displaystyle sum_{1leqslant i<jleqslant n} ext{len}(T_i)+ ext{len}(T_j)-2 imes ext{lcp}(T_i,T_j)1⩽i<j⩽n∑len(Ti)+len(Tj)−2×lcp(Ti,Tj)

其中， ext{len}(a)len(a) 表示字符串 aa 的长度， ext{lcp}(a,b)lcp(a,b) 表示字符串 aa 和字符串 bb 的最长公共前缀。

输入输出格式

输入格式：

一行，一个字符串 SS 。

输出格式：

一行，一个整数，表示所求值。

方法：

所有的leni和lenj我们可以o1计算出来实际上我们只需要求任意两个后缀之间的公共前缀总和即可这样我们就可以用后缀数组和单调栈维护了

单调栈枚举所有位置height值单调递增，将前面的信息划分成数个区域

代码:

#include<cstdio>

#include<iostream>

#include<cstring>

#include<algorithm>

#define M 500010

#define ll long long

using namespace std;

 

int tex[M], rank[M], tmp[M], sa[M], height[M], sta[M], num[M], n, m, q, top;

char s[M];

ll note[M], ans;

 

void qsort() {

    for(int i = 0; i <= m; i++) tex[i] = 0;

    for(int i = 1; i <= n; i++) tex[rank[tmp[i]]]++;

    for(int i = 2; i <= m; i++) tex[i] += tex[i - 1];

    for(int i = n; i >= 1; i--) sa[tex[rank[tmp[i]]]--] = tmp[i];

}

 

bool cmp(int l, int r, int wei) {

    return tmp[l] == tmp[r] && tmp[l + wei] == tmp[r + wei];

}

 

void suffix() {

    for(int i = 1; i <= n; i++) rank[i] = num[i], tmp[i] = i;

    m = 127;

    qsort();

    for(int w = 1; q < n; w <<= 1, m = q) {

        q = 0;

        for(int i = n - w + 1; i <= n; i++) tmp[++q] = i;

        for(int i = 1; i <= n; i++) if(sa[i] > w) tmp[++q] = sa[i] - w;

        qsort();

        swap(rank, tmp);

        rank[sa[1]] = q = 1;

        for(int i = 2; i <= n; i++) if(cmp(sa[i], sa[i - 1], w)) rank[sa[i]] = q;

            else rank[sa[i]] = ++q;

    }

    int j, k = 0;

    for(int i = 1; i <= n; height[rank[i++]] = k)

        for(k = k ? k - 1 : k, j = sa[rank[i] - 1]; num[i + k] == num[j + k]; k++);

}

 

 

void solve() {

    ans = 1ll * (n - 1 ) * (1 + n) * n / 2;

    top = 0;

    for(int i = 1; i <= n; i++) {

        while(top > 0 && height[sta[top]] > height[i]) top--;

        top++;

        sta[top] = i;

        note[top] = 1ll * height[i] * (i - sta[top - 1]) + note[top - 1];

        ans -= note[top] * 2;

    }

}

 

int main() {

    scanf("%s", s);

    n = strlen(s);

    for(int i = 1; i <= n; i++) num[i] = s[i - 1];

    suffix();

    solve();

    cout << ans << "
";

    return 0;

}

View Code

***********************************************************************************

[HAOI2016]找相同字符

题目描述

给定两个字符串，求出在两个字符串中各取出一个子串使得这两个子串相同的方案数。两个方案不同当且仅当这两个子串中有一个位置不同。

输入输出格式

输入格式：

两行，两个字符串s1，s2，长度分别为n1，n2。1 <=n1, n2<= 200000，字符串中只有小写字母

输出格式：

输出一个整数表示答案

方法：

首先对于这种多个字符串处理的题目后缀数组有一个普遍算法就是将字符串拼接起来进行一些操作就可以得到关于两个字符串之间的信息

对于这个题目来说首先我们要找到所有的极大的公共串之后每个极大公共串能够贡献他的长度这么多的答案

这样的话可以先预处理之后枚举两个串起点的height st表求出中间最小值更新答案

之后考虑更优算法在于考虑一个起点确定后height具有单调性用单调栈进行优化

单调栈考虑枚举height值这样维护一个单调递增的栈计算出每个最小值能够隔离开的个数（我也不知道该咋说了写代码吧）

代码:

#include<cstdio>

#include<algorithm>

#include<cstring>

#include<iostream>

#define ll long long

#define M 400040

using namespace std;

int tex[M], num[M], rank[M], sa[M], height[M], que[M], tmp[M], sum[M], n, m, q, l1, top;

char s[M];

long long ans = 0, note[M];

 

void qsort() {

    for(int i = 0; i <= m; i++) tex[i] = 0;

    for(int i = 1; i <= n; i++) tex[rank[tmp[i]]]++;

    for(int i = 1; i <= m; i++) tex[i] += tex[i - 1];

    for(int i = n; i >= 1; i--) sa[tex[rank[tmp[i]]]--] = tmp[i];

}

 

bool cmp(int l, int r, int wei) {

    return tmp[l] == tmp[r] && tmp[l + wei] == tmp[r + wei];

}

 

void suffix() {

    for(int i = 1; i <= n; i++) rank[i] = num[i], tmp[i] = i;

    m = 127;

    qsort();

    for(int w = 1; q < n; w <<= 1, m = q) {

        q = 0;

        for(int i = n - w + 1; i <= n; i++) tmp[++q] = i;

        for(int i = 1; i <= n; i++) if(sa[i] > w) tmp[++q] = sa[i] - w;

        qsort();

        swap(rank, tmp);

        rank[sa[1]] = q = 1;

        for(int i = 2; i <= n; i++) if(cmp(sa[i], sa[i - 1], w)) rank[sa[i]] = q;

            else rank[sa[i]] = ++q;

    }

    int j, k = 0;

    for(int i = 1; i <= n; height[rank[i++]] = k)

        for(k = k ? k - 1:k, j = sa[rank[i] - 1]; num[i + k] == num[j + k]; k++);

}

 

void work() {

    for(int i = 1; i <= n; i++) sum[i] = sum[i - 1] + (sa[i] <= l1);

    top = 0;

    que[0] = 1;

    for(int i = 1; i <= n; i++) {

        while(top > 0 && height[que[top]] > height[i]) top--;

        top++;

        que[top] = i;

        note[top] = 1ll * (sum[i - 1] - sum[que[top - 1] - 1]) * height[i] + note[top - 1];

        if(sa[i] > l1 + 1)  ans += note[top];

    }

    top = 0;

    for(int i = 1; i <= n; i++) sum[i] = sum[i - 1] + (sa[i] > l1 + 1);

    for(int i = 1; i <= n; i++) {

        while(top > 0 && height[que[top]] > height[i]) top--;

        top++;

        que[top] = i;

        note[top] = 1ll * (sum[i - 1] - sum[que[top - 1] - 1]) * height[i] + note[top - 1];

        if(sa[i] <= l1) ans += note[top];

    }

}

 

int main() {

    scanf("%s", s);

    int len = strlen(s);

    for(int i = 0; i < len; i++) num[++n] = s[i];

    l1 = n; // 保存何处是第一个串

    num[++n] = 'z' + 1;

    scanf("%s", s);

    len = strlen(s);

    for(int i = 0; i < len; i++) num[++n] = s[i];

    suffix();

//  for(int i = 1; i <= n; i++) printf("%d ", height[i]);

    work();

    cout << ans << "
";

    return 0 ;

}

View Code

参考文献:

Akakii大佬《后缀数组应用小结》

YxuanwKeith大佬《五分钟搞懂后缀数组！后缀数组解析以及应用(附详解代码)》（真的是五分钟么QAQ）

查看全文

相关阅读:
SVN服务器搭建和使用（二）
Unity5自动命名Assetbundle并打包
 Unity学习(六)5.x依赖打包
 Dota2技能系统设计分析
 android调用其他apk的activity
Android 跨应用调用Activity
Unity5的AssetBundle的一点使用心得
 Myeclipse快捷键集合
 我的CSS命名规则
 常用的 Windows 键

原文地址：https://www.cnblogs.com/luoyibujue/p/8972043.html