zoukankan      html  css  js  c++  java
  • 187. Repeated DNA Sequences (String; Bit)

     All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].

     思路I:遍历string,每次截取10个字符,判断出现次数。

    Result: Time Limit Exceeded

    思路II:字符数较少=>用数字表示字符=>用bitmap来表示字符串,好处:节省空间

    比如本题只可能出现4种字符=>可表示为0,1,2,3,即可以用2bits来表示=>字符原本一个字符占1 byte = 8 bits,现在只要2 bits

    class Solution {
    public:
         int getVal(char ch) {
            if (ch == 'A') return 0;
            if (ch == 'C') return 1;
            if (ch == 'G') return 2;
            if (ch == 'T') return 3;
        }
        
        vector<string> findRepeatedDnaSequences(string s) {
            int sLen = s.length();
            unsigned int val=0;
            char mp[1024*1024]={0};
            vector<string> ret;
            string str;
            
            if(sLen < 10) return ret;
            
            for(int i = 0; i < 9; i++){
                val <<=2;
                val |= getVal(s[i]);
            }
            
            for(int i = 9; i < sLen; i++){
                val <<= 2;
                val |= getVal(s[i]);
                val &= 0xFFFFF;
                if(++mp[val] == 2){
                    str = s.substr(i-9,10);
                    ret.push_back(str);
                }
            }
    
            return ret;
        }
    };
  • 相关阅读:
    开发模型----快速原型模型
    开发模型--瀑布模型
    python_001
    Linux文件的类型与系统目录
    流程控制语句
    test命令
    排序sort && 取消重复行uniq
    sed命令——用来对文件数据的 选取、替换、删除
    颜色RGB大全
    Markdown的使用
  • 原文地址:https://www.cnblogs.com/qionglouyuyu/p/5047362.html
Copyright © 2011-2022 走看看