zoukankan      html  css  js  c++  java
  • Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
    
    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].

    大致思路很简单,用一个hashmap来存储对应10个长度DNA的字符串及出现次数,最后将出现次数大于一次的存入list中,这里主要一个问题是map的key如果直接用字符串,会出现exceed time limit问题,必须将该DNA字符串hash成一个int型整数,A->00;C->01;G->10;T->11;这样一个10个字符长度的DNA序列映射成一个20位的2进制数,可将该2进制数作为key。代码如下:
    public class Solution {
        
        //将字符转换成对应2位2进制数
        public int toInt(char c) {
            if(c=='A') return 0;
            if(c=='C') return 1;
            if(c=='G') return 2;
            else return 3;
        }
        
        //将hashcode转换成DNA序列
        public String tostring(int n) {
            StringBuffer sb = new StringBuffer();
            for(int i=0;i<10;i++) {
                char c = 'T';
                int temp = n%4;
                n = n>>2;
                if(temp==0) c = 'A';
                if(temp==1) c = 'C';
                if(temp==2) c = 'G';
                sb.insert(0,c);
            }
            return sb.toString();
        }
        
        public List<String> findRepeatedDnaSequences(String s) {
            List<String> re = new ArrayList<String>();
            Map<Integer,Integer> map = new HashMap<Integer,Integer>();
            int size = s.length();
            if(size<=10) return re;
            int tmp = 0;
            for(int i=0;i<10;i++) {
                tmp = tmp<<2;
                tmp = tmp|toInt(s.charAt(i));
            }
            map.put(tmp,1);
            for(int j=10;j<size;j++) {
                tmp = ((tmp&0x3ffff)<<2)|toInt(s.charAt(j));//先讲最高2位置0在左移两位
                if(map.containsKey(tmp)) {
                    map.put(tmp,map.get(tmp)+1);
                }
                else {
                    map.put(tmp,1);
                }
            }
            
            Set<Integer> keys = map.keySet();
            for(Integer key:keys) {
                if(map.get(key)>1) re.add(tostring(key));
            }
            return re;
            
        }
    }
  • 相关阅读:
    从客户端(&)中检测到有潜在危险的 Request.Path 值。
    对访问修饰关键字public, protected, internal and private的说明
    C#综合揭秘——细说多线程(下)
    iis下配置:php+mysql
    工厂模式(Factory Patter)
    HDU 1040 As Easy As A+B [补]
    HDU 1020 Encoding
    HDU 1076 An Easy Task
    UVA 100 The 3n + 1 problem
    民大OJ 1668 追杀系列第二发
  • 原文地址:https://www.cnblogs.com/mrpod2g/p/4299559.html
Copyright © 2011-2022 走看看