zoukankan      html  css  js  c++  java
  • 09 Finding a Motif in DNA

    Problem

    Given two strings ss and tttt is a substring of ss if tt is contained as a contiguous collection of symbols in ss (as a result, tt must be no longer than ss).

    The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position ii of ss is denoted by s[i]s[i].

    A substring of ss can be represented as s[j:k]s[j:k], where jj and kk represent the starting and ending positions of the substring in ss; for example, if ss = "AUGCUUCAGAAAGGUCUUACG", then s[2:5]s[2:5] = "UGCU".

    The location of a substring s[j:k]s[j:k] is its beginning position jj; note that tt will have multiple locations in ss if it occurs more than once as a substring of ss (see the Sample below).

    Given: Two DNA strings ss and tt (each of length at most 1 kbp).

    Return: All locations of tt as a substring of ss.

    Sample Dataset

    GATATATGCATATACTT
    ATAT
    

    Sample Output

    2 4 10


    #-*-coding:UTF-8-*-
    ### 9. Finding a Motif in DNA ###
    
    # Method 1: Use Module regex.finditer
    import regex
    # 比re更强大的模块
    
    matches = regex.finditer('ATAT', 'GATATATGCATATACTT', overlapped=True)
    # 返回所有匹配项,
    for match in matches:
        print (match.start() + 1)
    
    
    
    # Method 2: Brute Force Search
    seq = 'GATATATGCATATACTT'
    pattern = 'ATAT'
    
    
    def find_motif(seq, pattern):
        position = []
        for i in range(len(seq) - len(pattern)):
            if seq[i:i + len(pattern)] == pattern:
                position.append(str(i + 1))
    
        print ('	'.join(position))
    
    
    find_motif(seq, pattern)
    
    
    
    
    # methond 3
    import re
    seq='GATATATGCATATACTT'
    print [i.start()+1 for i in re.finditer('(?=ATAT)',seq)]
    # ?= 之后字符串内容需要匹配表达式才能成功匹配。
    

      




  • 相关阅读:
    【XSY2534】【CF835D】Palindromic characteristics 回文自动机
    启发式合并&线段树合并/分裂&treap合并&splay合并
    【XSY2534】【BZOJ4817】树点涂色 LCT 倍增 线段树 dfs序
    线性求逆元
    l1 和l2范数的真实意义
    方向导数及梯度
    大厂实习总结和反思
    高考报考以及心态调整健康贴士
    【骑士走棋盘】
    【老鼠走迷宫一】
  • 原文地址:https://www.cnblogs.com/think-and-do/p/7272915.html
Copyright © 2011-2022 走看看