zoukankan      html  css  js  c++  java
  • LeetCode() Repeated DNA Sequences 看的非常的过瘾!

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
    
    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].

    非常好的思路: 转换成位操作。

    算法分析

    首先考虑将ACGT进行二进制编码

    A -> 00

    C -> 01

    G -> 10

    T -> 11

    在编码的情况下,每10位字符串的组合即为一个数字,且10位的字符串有20位;一般来说int有4个字节,32位,即可以用于对应一个10位的字符串。例如

    ACGTACGTAC -> 00011011000110110001

    AAAAAAAAAA -> 00000000000000000000

    20位的二进制数,至多有2^20种组合,因此hash table的大小为2^20,即1024 * 1024,将hash table设计为bool hashTable[1024 * 1024];

    vector<string> findRepeatedDnaSequences(string s) {
        int  hashMap[1048576] = {0};
        vector<string> ans;
        int len = s.size(),hashNum = 0;
        if (len < 11) return ans;
        for (int i = 0;i < 9;++i)
            hashNum = hashNum << 2 | (s[i] - 'A' + 1) % 5;
        for (int i = 9;i < len;++i)
            if (hashMap[hashNum = (hashNum << 2 | (s[i] - 'A' + 1) % 5) & 0xfffff]++ == 1)
                ans.push_back(s.substr(i-9,10));
        return ans;
    }
    

      

  • 相关阅读:
    组播IP地址
    改变未来的10大科技
    知行合一之健康
    2017第47周五
    2017第47周四感恩节
    spring boot测试
    2017第47周二
    音频格式opus
    周日反思
    四种人工智能技术对五个行业的影响
  • 原文地址:https://www.cnblogs.com/yanqi110/p/4996001.html
Copyright © 2011-2022 走看看