题目链接:http://codeforces.com/contest/828/problem/E
Everyone knows that DNA strands consist of nucleotides. There are four types of nucleotides: "A", "T", "G", "C". A DNA strand is a sequence of nucleotides. Scientists decided to track evolution of a rare species, which DNA strand was string s initially.
Evolution of the species is described as a sequence of changes in the DNA. Every change is a change of some nucleotide, for example, the following change can happen in DNA strand "AAGC": the second nucleotide can change to "T" so that the resulting DNA strand is "ATGC".
Scientists know that some segments of the DNA strand can be affected by some unknown infections. They can represent an infection as a sequence of nucleotides. Scientists are interested if there are any changes caused by some infections. Thus they sometimes want to know the value of impact of some infection to some segment of the DNA. This value is computed as follows:
- Let the infection be represented as a string e, and let scientists be interested in DNA strand segment starting from position l to position r, inclusive.
- Prefix of the string eee... (i.e. the string that consists of infinitely many repeats of string e) is written under the string s from position l to position r, inclusive.
- The value of impact is the number of positions where letter of string s coincided with the letter written under it.
Being a developer, Innokenty is interested in bioinformatics also, so the scientists asked him for help. Innokenty is busy preparing VK Cup, so he decided to delegate the problem to the competitors. Help the scientists!
The first line contains the string s (1 ≤ |s| ≤ 105) that describes the initial DNA strand. It consists only of capital English letters "A", "T", "G" and "C".
The next line contains single integer q (1 ≤ q ≤ 105) — the number of events.
After that, q lines follow, each describes one event. Each of the lines has one of two formats:
- 1 x c, where x is an integer (1 ≤ x ≤ |s|), and c is a letter "A", "T", "G" or "C", which means that there is a change in the DNA: the nucleotide at position x is now c.
- 2 l r e, where l, r are integers (1 ≤ l ≤ r ≤ |s|), and e is a string of letters "A", "T", "G" and "C" (1 ≤ |e| ≤ 10), which means that scientists are interested in the value of impact of infection e to the segment of DNA strand from position l to position r, inclusive.
For each scientists' query (second type query) print a single integer in a new line — the value of impact of the infection on the DNA.
ATGCATGC
4
2 1 8 ATGC
2 2 6 TTT
1 4 T
2 2 6 TA
8
2
4
GAGTTGTTAA
6
2 3 4 TATGGTG
1 1 T
1 6 G
2 5 9 AGTAATA
1 10 G
2 2 6 TTGT
0
3
1
Consider the first example. In the first query of second type all characters coincide, so the answer is 8. In the second query we compare string "TTTTT..." and the substring "TGCAT". There are two matches. In the third query, after the DNA change, we compare string "TATAT..."' with substring "TGTAT". There are 4 matches.
题意:
给出一个DNA序列(n<=1e5),有两种操作:
1. 1 x c 表示将x位置的碱基换成c
2. 2 l r e 表示一个询问:e字符串(|e|<=10)在[l,r]区间内循环出现,问有多少个位置的字符相同?
题解:
1.看到这题的直觉就是要构建一种数据结构,线段树或树状数组之类的。但是直接把整个DNA序列放到一棵树状数组上,是无法操作的。
2.那么就尝试着把DNA序列放到四棵树状数组中,每一颗代表着一种碱基。但对于匹配这个问题,还是只能用逐个位置匹配的枚举操作。
3.再回看题目,发现|e|最大只有10,那么突破点很可能在此。又因为在区间[l,r]匹配时,e循环出现。循环出现,即加入i%|e| = j%|e|,那么i和j这两个位置上的字符是相同的(对于e来说)。
4.根据第三点,可以开10*10*4棵树状数组,把DNA序列分解然后放入其中。bit[len][r_pos][ch]表示:在e的长度为len的情况下,DNA序列第i个位置的相对位置为r_pos = i%len,且这个位置的碱基为ch,则往这棵树状数组的i位置加上1。在统计时,只需枚举e的每一个字符,然后找到其在DNA序列上相对于e的相对位置,从而确定是哪一棵树状数组,再直接统计[l,r]区间上的个数即可。
5.这样设计树状数组,主要是利用到e在区间内重复出现,即如果i%|e| = j%|e|的特性,那么i和j就可以一起统计了。
代码如下:
1 #include <iostream> 2 #include <cstdio> 3 #include <cstring> 4 #include <cmath> 5 #include <cstdlib> 6 #include <string> 7 #include <vector> 8 #include <map> 9 #include <set> 10 #include <queue> 11 #include <sstream> 12 #include <algorithm> 13 using namespace std; 14 typedef long long LL; 15 const double eps = 1e-8; 16 const int INF = 2e9; 17 const LL LNF = 9e18; 18 const int MOD = 1e9+7; 19 const int MAXN = 1e5+10; 20 21 struct BIT 22 { 23 int c[MAXN]; 24 int lowbit(int x) 25 { 26 return x&(-x); 27 } 28 void add(int x, int d) 29 { 30 for(; x<MAXN; x += lowbit(x)) 31 c[x] += d; 32 } 33 int query(int x) 34 { 35 int ret = 0; 36 for(; x; x -= lowbit(x)) 37 ret += c[x]; 38 return ret; 39 } 40 int getsum(int l, int r) 41 { 42 return query(r)-query(l-1); 43 } 44 }bit[11][11][4]; 45 46 int M[130]; 47 char s[MAXN], tmp[100]; 48 int main() 49 { 50 M['A'] = 0; M['G'] = 1; M['C'] = 2; M['T'] = 3; 51 while(scanf("%s",s+1)!=EOF) 52 { 53 memset(bit, 0, sizeof(bit)); 54 for(int len = 1; len<=10; len++) 55 for(int i = 1; s[i]; i++) 56 bit[len][i%len][M[s[i]]].add(i,1); 57 58 int m, op, l, r; 59 scanf("%d",&m); 60 while(m--) 61 { 62 scanf("%d",&op); 63 if(op==1) 64 { 65 scanf("%d%s",&l,&tmp); 66 for(int len = 1; len<=10; len++) 67 { 68 bit[len][l%len][M[s[l]]].add(l,-1); 69 bit[len][l%len][M[tmp[0]]].add(l,1); 70 } 71 s[l] = tmp[0]; 72 } 73 else 74 { 75 int sum = 0; 76 scanf("%d%d%s",&l,&r,tmp); 77 int len = strlen(tmp); 78 for(int i = 0; i<len; i++) 79 sum += bit[len][(l+i)%len][M[tmp[i]]].getsum(l,r); 80 printf("%d ", sum); 81 } 82 } 83 } 84 }