All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
java的int占了4个字节位,总共32位;本题是将A,T,C,G四个字母每个字母占两位进行考虑,十个字母长度就是20位,每达到20位就存找到hashset里面。答案有一个地方非常巧妙,就是创建
两个hashset,一个来存储出现一次,另一个hashset来存储出现第二次,这里面有一个细节,在if条件句里面&&连接的两个hashset,如果第一个为false,那么第二个将不会执行;代码如下:
1 public class Solution { 2 public List<String> findRepeatedDnaSequences(String s) { 3 List<String> res = new ArrayList<String>(); 4 Set<Integer> words = new HashSet<Integer>(); 5 Set<Integer> doublewords = new HashSet<Integer>(); 6 int[] map = new int[26]; 7 map[0] = 0; 8 map['C'-'A'] = 1; 9 map['T'-'A'] = 2; 10 map['G'-'A'] = 3; 11 for(int i=0;i<s.length()-9;i++){ 12 int v = 0; 13 for(int j=i;j<i+10;j++){ 14 v<<=2; 15 v|=map[s.charAt(j)-'A']; 16 } 17 if(!words.add(v)&&doublewords.add(v)){ 18 res.add(s.substring(i,i+10)); 19 } 20 } 21 return res; 22 } 23 }