zoukankan      html  css  js  c++  java
  • Repeated DNA Sequences 解答

    Question

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].

    Solution -- Bit Manipulation

    Original idea is to use a set to store each substring. Time complexity is O(n) and space cost is O(n). But for details of space cost, a char is 2 bytes, so we need 20 bytes to store a substring and therefore (20n) space.

    If we represent DNA substring by integer, the space is cut down to  (4n).

     1 public List<String> findRepeatedDnaSequences(String s) {
     2     List<String> result = new ArrayList<String>();
     3  
     4     int len = s.length();
     5     if (len < 10) {
     6         return result;
     7     }
     8  
     9     Map<Character, Integer> map = new HashMap<Character, Integer>();
    10     map.put('A', 0);
    11     map.put('C', 1);
    12     map.put('G', 2);
    13     map.put('T', 3);
    14  
    15     Set<Integer> temp = new HashSet<Integer>();
    16     Set<Integer> added = new HashSet<Integer>();
    17  
    18     int hash = 0;
    19     for (int i = 0; i < len; i++) {
    20         if (i < 9) {
    21             //each ACGT fit 2 bits, so left shift 2
    22             hash = (hash << 2) + map.get(s.charAt(i)); 
    23         } else {
    24             hash = (hash << 2) + map.get(s.charAt(i));
    25             //make length of hash to be 20
    26             hash = hash &  (1 << 20) - 1; 
    27  
    28             if (temp.contains(hash) && !added.contains(hash)) {
    29                 result.add(s.substring(i - 9, i + 1));
    30                 added.add(hash); //track added
    31             } else {
    32                 temp.add(hash);
    33             }
    34         }
    35  
    36     }
    37  
    38     return result;
    39 }
  • 相关阅读:
    URL中增加BASE64加密的字符串引起的问题(java.net.MalformedURLException:Illegal character in URL)
    读《暗时间》总结
    假设写一个android桌面滑动切换屏幕的控件(一)
    JDBC Connection Reset问题分析
    深度学习工具caffe具体安装指南
    TS2
    TS 函数解析
    typescript
    响应式网页设计:rem、em设置网页字体大小自适应
    一看就懂得移动端rem布局、rem如何换算
  • 原文地址:https://www.cnblogs.com/ireneyanglan/p/4809078.html
Copyright © 2011-2022 走看看