zoukankan      html  css  js  c++  java
  • 187. Repeated DNA Sequences

    题目:

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
    
    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].

    链接: http://leetcode.com/problems/repeated-dna-sequences/  

    6/25/2017

    好久没刷题了,这道题也是参考别人的答案。

    48ms, 80%时间复杂度O(N*N*k),k=10, 第一个N来自遍历数组,第二个N来自substring

    注意第8行,结束的位置是i <= s.length() - 10,要包含最后一位。

     1 public class Solution {
     2     public List<String> findRepeatedDnaSequences(String s) {
     3         List<String> res = new ArrayList<String>();
     4         if (s == null || s.length() < 10) {
     5             return res;
     6         }
     7         Map<String, Integer> substringCount = new HashMap<String, Integer>();
     8         for (int i = 0; i <= s.length() - 10; i++) {
     9             String substring = s.substring(i, i + 10);
    10             if (substringCount.containsKey(substring)) {
    11                 int count = substringCount.get(substring);
    12                 if (count == 1) {
    13                     res.add(substring);
    14                 }
    15                 substringCount.put(substring, count + 1);
    16             } else {
    17                 substringCount.put(substring, 1);
    18             }
    19         }
    20         return res;
    21     }
    22 }

    别人的答案:

    类似rabin-karp,因为只有4个字符,所以每个字符用2位来表示(4^10 < 2^32),map里只需要比较数组而不是string,map的效率更高。链接里有解释

    https://discuss.leetcode.com/topic/8894/clean-java-solution-hashmap-bits-manipulation

    类似的,只不过用了8进制,链接里有解释,但是我稍微写详细一些。

    t存的是所有10个字符的int hash值,这个值是通过这个算法里来计算的。注意有个ox3FFFFFFF,想明白了这个是只保留最后30位,为什么因为字符通过&7之后每个只保留3位2进制数,如果是10个字符的话正好是30位,可以消去10个字符之前的影响。

    https://discuss.leetcode.com/topic/8487/i-did-it-in-10-lines-of-c

    1 vector<string> findRepeatedDnaSequences(string s) {
    2     unordered_map<int, int> m;
    3     vector<string> r;
    4     for (int t = 0, i = 0; i < s.size(); i++)
    5         if (m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1)
    6             r.push_back(s.substr(i - 9, 10));
    7     return r;
    8 }

    更多讨论

    https://discuss.leetcode.com/category/195/repeated-dna-sequences

  • 相关阅读:
    BM求递推式模板
    主席树浅谈
    DSU on Tree浅谈
    树链剖分浅谈
    省选模拟八 题解
    提答题 总结
    交互题 总结
    省选模拟七 题解
    省选模拟六 题解
    省选模拟五 题解
  • 原文地址:https://www.cnblogs.com/panini/p/7077834.html
Copyright © 2011-2022 走看看