zoukankan      html  css  js  c++  java
  • 16 Finding a Protein Motif

    Problem

    To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.

    You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into

    http://www.uniprot.org/uniprot/uniprot_id
    

    Alternatively, you can obtain a protein sequence in FASTA format by following

    http://www.uniprot.org/uniprot/uniprot_id.fasta
    

    For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.

    Given: At most 15 UniProt Protein Database access IDs.

    Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

    Sample Dataset

    A2Z669
    B5ZC00
    P07204_TRBM_HUMAN
    P20840_SAG1_YEAST
    

    Sample Output

    B5ZC00
    85 118 142 306 395
    P07204_TRBM_HUMAN
    47 115 116 382 409
    P20840_SAG1_YEAST
    79 109 135 248 306 348 364 402 485 501 614

    #coding=utf-8
    import urllib2
    import re
    list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST']
    
    for one in list:
        name = one.strip('
    ')
        url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
        req = urllib2.Request(url)
        response = urllib2.urlopen(req)
        the_page = response.read()
        start = the_page.find('
    M')
        seq = the_page[start+1:].replace('
    ','')
        seq = ' '+seq
        regex = re.compile(r'N(?=[^P][ST][^P])')
        index = 0
        out = []
        '''
        out = [m.start() for m in re.finditer(regex, seq)]
        '''
    
        index = 0
        while(index<len(seq)):
            index += 1
    
            if re.search(regex,seq[index:]) == None:
                break
    
    
            #print S[index:]
            if re.match(regex,seq[index:]) != None:
                out.append(index)
    
    
    
    
        if out != []:
            print name
            print ' '.join([ str(i) for i in out])
    

      

  • 相关阅读:
    Sizzle源码分析 (一)
    VueJS 数据驱动和依赖追踪分析
    使用 nvm 来管理nodejs版本 。
    在node中使用 ES6
    mongoDB & Nodejs 访问mongoDB (二)
    mongoDB & Nodejs 访问mongoDB (一)
    Javascript原型链和原型继承
    Javascript 闭包与高阶函数 ( 二 )
    SDOI2019&十二省联考 游记
    Luogu-3648 [APIO2014]序列分割
  • 原文地址:https://www.cnblogs.com/think-and-do/p/7283840.html
Copyright © 2011-2022 走看看