All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
解题思路一:
直接用HashMap实现,JAVA实现如下:
static public List<String> findRepeatedDnaSequences(String s) { List<String> list=new ArrayList<String>(); HashMap<String,Integer> hm=new HashMap<String,Integer>(); for(int i=0;i<=s.length()-10;i++){ if(hm.containsKey(s.substring(i,i+10))) list.add(s.substring(i,i+10)); else hm.put(s.substring(i,i+10), 1); } return list; }
结果Memory Limit Exceeded
解题思路二:
模拟Hash,将A、C、G、T分别变为0、1、2、3,然后每10位计算下hashcode,如果hashcode所在的count为1则输出,JAVA实现如下:
static int getValue(char ch) { if (ch == 'A') return 0; else if (ch == 'C') return 1; else if (ch == 'G') return 2; else return 3; } static public List<String> findRepeatedDnaSequences(String s) { List<String> list = new ArrayList<String>(); if (s.length() <= 10) return list; int[] count = new int[(1 << 20)-1]; int hash = 0; for (int i = 0; i < 9; i++) hash = (hash << 2) | getValue(s.charAt(i)); for (int i = 9; i < s.length(); i++) { hash = (1<<20)-1&((hash << 2) | getValue(s.charAt(i))); if (count[hash]==1) list.add(s.substring(i - 9, i + 1)); count[hash]++; } return list; }