http://acm.hdu.edu.cn/showproblem.php?pid=1686
Oulipo
Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 6098 Accepted Submission(s): 2448
Problem Description
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
1 #include <iostream> 2 #include <stdlib.h> 3 #include <stdio.h> 4 #include <cstring> 5 using namespace std; 6 int n,m,nxt[10005],kk,t; 7 char b[10005],a[1000005]; 8 ///此题在基础的kmp上加了多次匹配。 9 ///就意味着我们在匹配完一次字串后,要跳到最适合的位置,继续查找 10 ///继续利用kmp的思想。某些位置已经匹配过,就不要匹配了。 11 /// xxxxxxxabbaab*xxxxxx 12 /// abbaaba 13 //我们跳跃之后的位置 abbaaba 而跳跃的位置与next数组有关 14 //kmp中主串的位置都没有被调动,只是next数组的下标被调动(自己写的代码乱动了,我又卖萌了。。。) 15 void buildnxt() 16 { 17 int j,k; 18 m=strlen(b); 19 nxt[0]=-1; 20 j=0;k=-1; 21 while(j<m) 22 { 23 if((k==-1)||b[j]==b[k]) 24 { 25 j++; 26 k++; 27 nxt[j]=k; 28 } 29 else k=nxt[k]; 30 } 31 } 32 int kmp() 33 { 34 int k=0,l=0,cou=0; 35 n=strlen(a); 36 /*int ans=m,kk=nxt[m];///ans在字串中下标,和起点距离ans+1 37 while(1) 38 { 39 if(kk!=0&&kk!=-1) {ans=kk;kk=nxt[ans];} 40 else break; 41 }///要找最小的跳跃点,所以从next尾端返回去找到首个非负值。 42 额 这个想法是没错,但是时间上还是不够优化。 43 对与最小跳跃点的话也就是中间跳跃点,最小跳跃点对应的字符串匹配失败。没必要要该点匹配。 44 于是我们最省事的做法还是直接往前跳一步,有可能匹配成功。 45 */ 46 while(k<n) 47 { 48 if((l==-1)||a[k]==b[l]) 49 { 50 k++; 51 l++; 52 } 53 else l=nxt[l]; 54 if(l==m) 55 { 56 cou++; 57 /*if(kk==0) continue;///如果是尾端next数组是0的话,主串中匹配的子串中没有重复。 58 ///也就是说在匹配的主串中,没有可以跳跃的点。 59 if(k==n-1) break;///如果k已经是主串末尾了,就不能还有继续可以匹配的字串了。 60 k=k-l+ans;///k-l(起点)+ans 61 l=0;*/ 62 l=nxt[l];//next跳到次大子串点重新匹配,跳过已经匹配好的部分 63 } 64 } 65 return cou; 66 } 67 int main() 68 { 69 scanf("%d",&t); 70 getchar(); 71 while(t--) 72 { 73 gets(b); 74 gets(a); 75 memset(nxt,0,sizeof(nxt)); 76 buildnxt(); 77 printf("%d ",kmp()); 78 } 79 return 0; 80 }