zoukankan      html  css  js  c++  java
  • LeetCode-Repeated DNA Sequences (位图算法减少内存)

    Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

    Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

    For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
    
    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].
    
     
    用位图算法可以减少内存,代码如下:
    int map_exist[1024 * 1024 / 32];
    int map_pattern[1024 * 1024 / 32];
    
    #define set(map,x) 
        (map[x >> 5] |= (1 << (x & 0x1F)))
    
    #define test(map,x) 
        (map[x >> 5] & (1 << (x & 0x1F)))
    
    int dnamap[26];
    
    char** findRepeatedDnaSequences(char* s, int* returnSize) {
        *returnSize = 0;
        if (s == NULL) return NULL;
        int len = strlen(s);
        if (len <= 10) return NULL;
    
        memset(map_exist, 0, sizeof(int)* (1024 * 1024 / 32));
        memset(map_pattern, 0, sizeof(int)* (1024 * 1024 / 32));
    
        dnamap['A' - 'A'] = 0;  dnamap['C' - 'A'] = 1;
        dnamap['G' - 'A'] = 2;  dnamap['T' - 'A'] = 3;
    
        char ** ret = malloc(sizeof(char*));
        int curr = 0;
        int size = 1;
        int key;
        int i = 0;
    
        while (i < 9)
            key = (key << 2) | dnamap[s[i++] - 'A'];
        while (i < len){
            key = ((key << 2) & 0xFFFFF) | dnamap[s[i++] - 'A'];
            if (test(map_pattern, key)){
                if (!test(map_exist, key)){
                    set(map_exist, key);
                    if (curr == size){
                        size *= 2;
                        ret = realloc(ret, sizeof(char*)* size);
                    }
                    ret[curr] = malloc(sizeof(char)* 11);
                    memcpy(ret[curr], &s[i-10], 10);
                    ret[curr][10] = '';
                    ++curr;
                }
    
            }
            else{
                set(map_pattern, key);
            }
        }
    
        ret = realloc(ret, sizeof(char*)* curr);
        *returnSize = curr;
        return ret;
    }

    该算法用时 6ms 左右, 非常快

     
  • 相关阅读:
    大数据-数据分析-numpy库-数组的深拷贝和浅拷贝
    windows环境下mysql主从配置
    C#定时发送邮箱设置
    论《LEFT JOIN条件放ON和WHERE后的区别》
    记录成长
    RobotFramework+Selenium如何提高脚本稳定性
    Jekins 插件Extended Choice Parameter显示Json Parameter Type遇到的问题
    nGrinder 参数使用
    Jenkins REST API 实例
    java ee config / nacos / shit Alibaba Middleware
  • 原文地址:https://www.cnblogs.com/jimmysue/p/4483357.html
Copyright © 2011-2022 走看看