zoukankan      html  css  js  c++  java
  • leetcode[187]Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
    
    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].
    class Solution {
    public:
    /**
     * 所有DNA都是由一系列碱基构成, 分别为ACGT, 题目要求找出所有长度为10的子串, 这些子串在原串中出现次数必须大于1次(重复出现)
     * 思路:
     *     1、暴力枚举肯定是会超时
     *     2、hash
     *        1)unordered_set<string> repeated 存储长度为10的子字符串,遍历字符串,在repeated中查找S[i]~S[i+9]构成的子串:
    * 若未查找到,则将其添加到repeated中,若找到,则重复,将其添加到vector<string> res中; * 2)然而unordered_set<string>对于超长的输入串, 会消耗大量的存储空间; * 改进:字符串压缩(10个字符char的子串需要8bit*10=80bit,而A C G T 四个字符需要两位bit编码00 01 10 11,10个char字符需要2bit*10=20bit,1 int=32 bit) * 3)另外还需要考虑res中的重复答案, 因为每次只要出现在repeated中就放入res, 这显然会造成重复放置问题; * 改进:再构造一个unordered_set<int> check, 用于存储已经存入res中的重复子串对应的strInt值; *
    */ vector<string> findRepeatedDnaSequences(string s) { vector<string> res; if(s.empty() || s.size()<10) return res; unordered_map<char, unsigned int> smap = {{'A', 0},{'C', 1},{'G', 2},{'T', 3}}; unordered_set<unsigned int> repeated, check; int strInt = 0; for(int i = 0; i < 10; i++){ strInt = (strInt<<2) + smap[s[i]]; } repeated.insert(strInt); for(int i = 10; i < s.size(); i++ ){ strInt = ((strInt & 0x3ffff)<<2)+smap[s[i]]; if(repeated.find(strInt)==repeated.end()){ repeated.insert(strInt); }else{ if(check.find(strInt) == check.end()){ res.push_back(s.substr(i-9,10)); check.insert(strInt); } } } return res; } };
  • 相关阅读:
    map的初级应用
    RB-Tree删除详解
    RB-Tree插入过程详解
    红黑树操作详解——很形象的过程
    一个数据结构可视化的神奇网址——形象理解
    关于B树B+树的详细解释——绝对精彩
    c++入门之函数指针和函数对象
    树的平衡之AVL树——错过文末你会后悔,信我
    二叉查找树的删除
    1 vmware 如何联网,以及行命令令初步
  • 原文地址:https://www.cnblogs.com/Vae1990Silence/p/4771423.html
Copyright © 2011-2022 走看看