zoukankan      html  css  js  c++  java
  • DNA sequence(映射+BFS)

    Problem Description

    The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.

    For example, given "ACGT","ATGC","CGTT" and "CAGT", you can make a sequence in the following way. It is the shortest but may be not the only one.

    Input

    The first line is the test case number t. Then t test cases follow. In each case, the first line is an integer n ( 1<=n<=8 ) represents number of the DNA sequences. The following k lines contain the k sequences, one per line. Assuming that the length of any sequence is between 1 and 5.

    Output

    For each test case, print a line containing the length of the shortest sequence that can be made from these sequences.

    SampleInput

    1
    4
    ACGT
    ATGC
    CGTT
    CAGT

    SampleOutput

    8

    题意就是给你几个DNA序列,要求找到一个序列,使得所有序列都是它的子序列(不一定连续)。
    直接搜MLE、TLE、RE,所以不能直接搜索,一般处理这种序列问题,都是把序列映射到整数或其他便于处理的东西上。
    题目还说了每个DNA的序列长度不会超过5,所以我们可以按位处理映射到一个整数上,而且题目只需要我们输出最短的序列长度,所以我们也不必去映射字符,映射长度便够了。
    最多8个字符,每个字符1-5长度,所以最大数为6^8。好为什么是6^8,不明明是5^8么,这个我暂时先不解释,我加在了代码注释里。
    代码:
      1 #include <iostream>
      2 #include <string>
      3 #include <cstdio>
      4 #include <cstdlib>
      5 #include <sstream>
      6 #include <iomanip>
      7 #include <map>
      8 #include <stack>
      9 #include <deque>
     10 #include <queue>
     11 #include <vector>
     12 #include <set>
     13 #include <list>
     14 #include <cstring>
     15 #include <cctype>
     16 #include <algorithm>
     17 #include <iterator>
     18 #include <cmath>
     19 #include <bitset>
     20 #include <ctime>
     21 #include <fstream>
     22 #include <limits.h>
     23 #include <numeric>
     24 
     25 using namespace std;
     26 
     27 #define F first
     28 #define S second
     29 #define mian main
     30 #define ture true
     31 
     32 #define MAXN 1000000+5
     33 #define MOD 1000000007
     34 #define PI (acos(-1.0))
     35 #define EPS 1e-6
     36 #define MMT(s) memset(s, 0, sizeof s)
     37 typedef unsigned long long ull;
     38 typedef long long ll;
     39 typedef double db;
     40 typedef long double ldb;
     41 typedef stringstream sstm;
     42 const int INF = 0x3f3f3f3f;
     43 
     44 int t,n;
     45 map<int,int>vis;
     46 char s[10][10];    //保存序列
     47 int len[10];    //保存每个序列的长度
     48 int p[10] = {1,6,36,216,1296,7776,46656,279936,1679616,10077696};    //6的k次方表
     49 char temp[4]={'A','C','G','T'};
     50 
     51 struct node{
     52     int step;    //长度
     53     int st;    //也就是映射数
     54     node(){}
     55     node(int _step, int _st):step(_step),st(_st){}
     56 };
     57 
     58 int bfs(int res){
     59     vis.clear();
     60     queue<node>q;
     61     q.push(node(0,0));
     62     vis[0] = 1;
     63     while(!q.empty()){
     64         node nxt,k = q.front();
     65         q.pop();
     66         if(k.st == res){    //当映射等于结果时 返回长度
     67             return k.step;
     68         }
     69         for(int i = 0; i < 4; i++){
     70             nxt.st = 0;
     71             nxt.step = k.step+1;
     72             int tp = k.st;
     73             for(int j = 1; j <= n; j++){
     74                 int x = tp%6;    //得到位数
     75                 tp /= 6;
     76                 if(x == len[j] || s[j][x+1] != temp[i]){    //判断字符是否匹配
     77                     nxt.st += x*p[j-1];
     78                 }
     79                 else{
     80                     nxt.st += (x+1)*p[j-1];
     81                 }
     82             }
     83             if(vis[nxt.st] == 0){    //标记是否已经搜过
     84                 q.push(nxt);
     85                 vis[nxt.st] = 1;
     86             }
     87         }
     88     }
     89 }
     90 
     91 int main(){
     92     ios_base::sync_with_stdio(false);
     93     cout.tie(0);
     94     cin.tie(0);
     95     cin>>t;
     96     while(t--){
     97         cin>>n;
     98         int res = 0;
     99         for(int i = 1; i <= n; i++){    //因为数组从0开始计数,但我们映射以及后面操作都是基于位置,所以从1开始
    100             cin>>s[i]+1;    //同理从一开始
    101             len[i] = strlen(s[i]+1);
    102             res += len[i]*p[i-1];    //这也就是为什么是6^8,因为我们是从1开始有5个状态而不是0
    103         }
    104         cout << bfs(res) <<endl;
    105     }
    106     return 0;
    107 }

    所以这题你非要从0位置搞,弄5^8确实没错,也可以做出来,但是操作会繁琐很多,还不如从方便的角度多加一个长度。


    这道题的难度就是不知道怎么入手,即使知道转换处理也不知道该如何转换以及如何搜索,这里我们避免了去从字符开始搜索,而是直接基于长度搜。

    值得一提的是,我问了队友后,他们表示这道题做法很多,还可以用IDA*算法或者启发式搜索,甚至不用搜索用AC自动机加矩阵也可以做。但这些做法都是基于字符去搜索的,也不能说谁好谁坏,只是我们的思维就不一样了,很多题目其实都不止一种解法,多想想,很有用的。至于其他做法我也就懒得做了(其实是不会23333)

  • 相关阅读:
    JAVA-初步认识-第十二章-多线程创建方式一继承
    JAVA-初步认识-第十二章-主线程运行示例
    JAVA-初步认识-第十二章-JVM中的多线程分析
    JAVA-初步认识-第十二章-面向对象(Jar包)
    JAVA-初步认识-第十二章-面向对象(导入import)
    ecstore-kvcache里表结构
    ecstore-app接口
    ecstore与淘宝sdk的autoload加载顺序问题
    正则匹配<{$vcode}>变量
    电子面单纸打印时固定高度18cm,到底是多少px
  • 原文地址:https://www.cnblogs.com/xenny/p/9388400.html
Copyright © 2011-2022 走看看