zoukankan      html  css  js  c++  java
  • POJ 3691 DNA repair (DP+AC自动机)

    DNA repair
    Time Limit: 2000MS   Memory Limit: 65536K
    Total Submissions: 4815   Accepted: 2237

    Description

    Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters 'A', 'G' , 'C' and 'T'. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can still contain only characters 'A', 'G', 'C' and 'T'.

    You are to help the biologists to repair a DNA by changing least number of characters.

    Input

    The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
    The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.
    The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

    The last test case is followed by a line containing one zeros.

    Output

    For each test case, print a line containing the test case number( beginning with 1) followed by the
    number of characters which need to be changed. If it's impossible to repair the given DNA, print -1.

    Sample Input

    2
    AAA
    AAG
    AAAG    
    2
    A
    TG
    TGAATG
    4
    A
    G
    C
    T
    AGT
    0

    Sample Output

    Case 1: 1
    Case 2: 4
    Case 3: -1

    Source

     
     

    所谓的AC自动机上的DP,就是说在AC自动机上转移,那么如果目标串不包含任何一个病毒串,那么它在Trie图上就不可能匹配成功,既然如此,那么以Trie图上某个结点为状态的DP就可以实施了......

          具体先说建AC自动机,就有一个特殊的地方:

           ①:为什么要设立虚节点? 其实这个是为了底下状态转移用的,如果这个节点底下是空的,那么我们再转移时就是回到跟,因为相当于匹配完了某一个病毒串了,我们应该再匹配所有的有可能的病毒串了,如果这里不建立虚的节点,那么就需要一些特殊判定.... 会很麻烦。

          ②: 小主意一点就是如果某个点的fail指针指向点是危险节点,那么这个点也必然是危险节点(这是显然的)。

        然后就是DP方程:  f(i,j)表示主串匹配到第i位时在自动机上的状态为j时的最少修改次数,方程大家先看着:

        f(i,j.son(就是字典树的next数组))=min(f(i,j.son),f(i-1,j)+(j.son!=hash(str[i-1]))) 字符串第一位是存在0那里。(若j.son是病毒串,那么不能转移)

         先看最主要的一个问题: 状态这个j.son怎么理解:  比如说当前Trie图上主串走到了abb然后开始往abbb上转移,即j.son==b,然后发现主串中第四个位置是c之类的,反正不等于b,那么想要转移过来怎么办,那么很简单,+1也就是直接修改成b即可,即有abbc变成了abbb....

      这就是AC自动机上的DP的个人认为的关键地方,那么虚拟节点的解释就很好理解了,就是它应该回到根节点去跟其他的病毒串去匹配啦.... 

    #include<iostream>
    #include<cstdio>
    #include<cstring>
    #include<algorithm>
    
    using namespace std;
    
    const int N=1010;
    const int INF=0x3f3f3f3f;
    
    struct Trie{
        int ok;
        Trie *fail;
        Trie *next[4];
        void init(){
            ok=0;
            fail=NULL;
            memset(next,NULL,sizeof(next));
        }
    }*root,*q[N],a[N];
    
    char wrd[30];
    char str[N];
    int n,cnt,dp[N][N];
    
    int find(char ch){
        switch(ch){
            case 'A':return 0;
            case 'C':return 1;
            case 'T':return 2;
            case 'G':return 3;
        }
        return 0;
    }
    
    void InsertTrie(char *str){
        Trie *loc=root;
        for(int i=0;str[i]!='';i++){
            int id=find(str[i]);
            if(loc->next[id]==NULL){
                a[cnt].init();
                loc->next[id]=&a[cnt++];
            }
            loc=loc->next[id];
        }
        loc->ok=1;
    }
    
    void AC_automation(){
        int head=0,tail=0;
        root->fail=NULL;
        q[tail++]=root;
        Trie *cur,*tmp;
        while(head<tail){
            cur=q[head++];
            tmp=NULL;
            for(int i=0;i<4;i++){
                if(cur->next[i]==NULL){
                    if(cur==root)       //方便DP
                        cur->next[i]=root;
                    else
                        cur->next[i]=cur->fail->next[i];
                }else{
                    tmp=cur->fail;
                    while(tmp!=NULL){
                        if(tmp->next[i]!=NULL){
                            cur->next[i]->fail=tmp->next[i];
                            cur->next[i]->ok |= tmp->next[i]->ok;
                            break;      //注意退出循环
                        }
                        tmp=tmp->fail;
                    }
                    if(tmp==NULL)
                        cur->next[i]->fail=root;
                    q[tail++]=cur->next[i];
                }
            }
        }
    }
    
    int main(){
    
        //freopen("input.txt","r",stdin);
    
        int cases=0;
        while(~scanf("%d",&n) && n){
            cnt=0;
            root=&a[cnt++];
            root->init();
            for(int i=0;i<n;i++){
                scanf("%s",wrd);
                InsertTrie(wrd);
            }
            AC_automation();
            scanf("%s",str);
            int len=strlen(str);
            for(int i=0;i<N;i++)
                for(int j=0;j<N;j++)
                    dp[i][j]=INF;
            dp[0][0]=0;
            for(int i=1;i<=len;i++)
                for(int j=0;j<cnt;j++)
                    for(int k=0;k<4;k++){
                        Trie *tmp=a[j].next[k];
                        if(tmp->ok)
                            continue;
                        int dis=tmp-root;
                        dp[i][dis]=min(dp[i][dis],dp[i-1][j]+(k!=find(str[i-1])));
                    }
            int ans=INF;
            for(int i=0;i<cnt;i++)
                if(ans>dp[len][i])
                    ans=dp[len][i];
            printf("Case %d: ",++cases);
            if(ans==INF)
                printf("-1
    ");
            else
                printf("%d
    ",ans);
        }
        return 0;
    }
  • 相关阅读:
    746. 使用最小花费爬楼梯(动态规划题)
    91.解码方法(动态规划)
    198/213 打家劫舍(动态规划)
    5. 最长回文子串 (从今天开始刷动态规划50题)
    POJ 2142
    HDU 4686
    HDU 4767
    HDU 1757
    POJ 3613
    HDU 2157
  • 原文地址:https://www.cnblogs.com/jackge/p/3147556.html
Copyright © 2011-2022 走看看