zoukankan      html  css  js  c++  java
  • Repeated DNA Sequences

    题目:

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
    
    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].

    思路:

    这个题上来打算用hashmap把所有的子序列都放进去,找相同的。是不是有点天真了。。

    查了一下,可以用bit mask来做,恍然大悟。bit manipulation总是可以让代码变得很简洁。

    直接看代码吧。也没什么巧妙地地方,只是用两个bit代表一个字母。

    A:00 

    C:01

    G:  10

    T:  11

     1     public List<String> findRepeatedDnaSequences(String s) {
     2         List<String> res = new ArrayList<String>();
     3         if (s.length() <= 10) {
     4             return res;
     5         }
     6         HashMap<Character, Integer> map = new HashMap<Character, Integer>();
     7         map.put('A', 0);
     8         map.put('C', 1);
     9         map.put('G', 2);
    10         map.put('T', 3);
    11         
    12         int mask = 0;
    13         
    14         Set<Integer> subSeq = new HashSet<Integer>();
    15         Set<Integer> addedSubSeq = new HashSet<Integer>();
    16         
    17         for (int i = 0; i < s.length(); i++) {
    18             if (i < 9) {
    19                 mask = mask << 2;
    20                 mask += map.get(s.charAt(i));
    21             }else{
    22                 mask = mask << 2;
    23                 mask += map.get(s.charAt(i));
    24                 
    25                 //we only need 20 bits for 10 letter
    26                 mask = mask << 12;
    27                 mask = mask >>> 12;
    28                 if (subSeq.contains(mask) && !addedSubSeq.contains(mask)) {
    29                     res.add(s.substring(i - 9, i + 1));
    30                     addedSubSeq.add(mask);
    31                 } else {
    32                     subSeq.add(mask);
    33                 }
    34             }
    35         }
    36         return res;
    37     }
  • 相关阅读:
    BZOJ2219数论之神——BSGS+中国剩余定理+原根与指标+欧拉定理+exgcd
    Luogu 3690 Link Cut Tree
    CF1009F Dominant Indices
    CF600E Lomsat gelral
    bzoj 4303 数列
    CF1114F Please, another Queries on Array?
    CF1114B Yet Another Array Partitioning Task
    bzoj 1858 序列操作
    bzoj 4852 炸弹攻击
    bzoj 3564 信号增幅仪
  • 原文地址:https://www.cnblogs.com/gonuts/p/4475203.html
Copyright © 2011-2022 走看看