zoukankan html css js c++ java

Repeated DNA Sequences 【9ms未解决】

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

思路：1.用map来存储字符序列。2.检查序列是否已经存在在map中。如果存在且count=1，就将序列添加到结果中

注意：map<string，int> 会造成 memory limits exceed
详细见：C++中基本类型的字节数。

解决1，将A,C,G,T替换成0,1,2,3. 注意：最长为3333333333> 2³¹-1 (2147483647)  所以不能用int,会溢出。
      选择map<unsigned int, bool>  (因为bool为1个字节，int为4个字节，节省空间)

解法2，用 bit manipulation. 观察A,C,G,T 的ascii码
      ASCII    后三位，   倒数后两位
      A：65，   001        00
      C：67，   011        01
      G：71，   111        11
      T：84，   100        10
（1）标示符取后三位，则长度为10的字符串，可以用30bit来表示。    t<<3 & 0x3FFFFFFF | s[i]&7
（2）标示符取倒数后两位，则长度为10的字符串，可以用20bit来表示。 t<<3 & 0xFFFFF | s[i]>>1&0x3
 因此，采用map<int,bool>

解法3，不采用map来查找，而是用另一种方法。leetcode上一位大牛写了9ms的方法！！！

总结：1.仔细观察特征 2.时时刻刻想着去节省空间。

这是我的解法2

class Solution {
public:
      vector<string> findRepeatedDnaSequences(string s) {
       //check validation
       vector<string> res;
       if(s.empty()) return res;
       //check special case
       int n=s.length();
       if(n<10) return res;
       
       //general case: to reduce the memory
       unordered_map<int,bool> map;
       int t=0;
       for(int i=0;i<n;i++){
           t = (t<<2) & 0xFFFFF|(s[i]>>1 & 0x3);
           if(i>=9){
               if(map.count(t)){
                   if(map[t]==true){
                       res.push_back(s.substr(i-9,10));
                       map[t]=false;
                   }
               }else map[t]=true;
           }
       }
       return res;
    }
};

查看全文

相关阅读:
svn command line tag
MDbg.exe（.NET Framework 命令行调试程序）
Microsoft Web Deployment Tool
sql server CI
VS 2010 One Click Deployment Issue “Application Validation did not succeed. Unable to continue”
mshtml
大厂程序员站错队被架空，只拿着五折工资！苟活和离职，如何选择？
揭秘！Windows 为什么会蓝屏？微软程序员竟说是这个原因...
喂！千万别忘了这个C语言知识！（~0 == -1 问题）
Linux 比 Windows 更好，谁反对？我有13个赞成理由

原文地址：https://www.cnblogs.com/renrenbinbin/p/4418102.html