All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
位操作+hashmap。1.把ACGT转化成00,01,10,11然后每次<<2来实现字符到数字的转换,从而节省时间复杂度。2.hashmap要记录出现次数,避免同一串出现两次最后加两次到结果里。
细节:for循环开头的数字遍历是for(int i = 0; i < s.length() - 9; i++) 最后是-9,不是-10,因为你想本来是遍历到最后长度为1的字符串,你要遍历到最后长度为10的字符串,那差了9个位置啊。
(普通一点的也可以直接用substring,或者stringBuilder开始暖场。)
class Solution { public List<String> findRepeatedDnaSequences(String s) { List<String> result = new ArrayList<>(); if (s == null || s.length() <= 10) { return result; } int[] map = new int[26]; map['A' - 'A'] = 0; map['C' - 'A'] = 1; map['G' - 'A'] = 2; map['T' - 'A'] = 3; Map<Integer, Integer> Cntmap = new HashMap<Integer, Integer>(); for (int i = 0; i < s.length() - 9; i++) { int trans = 0; for (int j = i; j < i + 10; j++) { trans |= (map[s.charAt(j) - 'A']) << ((j - i) * 2); } if (Cntmap.containsKey(trans)) { if (Cntmap.get(trans) == 1) { result.add(s.substring(i, i + 10)); } Cntmap.put(trans, Cntmap.get(trans) + 1); } else { Cntmap.put(trans, 1); } } return result; } }