zoukankan      html  css  js  c++  java
  • 08 Translating RNA into Protein

    Problem

    The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.

    The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.

    Given: An RNA string ss corresponding to a strand of mRNA (of length at most 10 kbp).

    Return: The protein string encoded by ss.

    Sample Dataset

    AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA
    

    Sample Output

    MAMAPRTEINSTRING

    方法一:
    # -*- coding: utf-8 -*-
    ### 8. Translating RNA into Protein ###
    import re
    from collections import OrderedDict
    
    codonTable = OrderedDict()
    with open('rna_codon_table.txt') as f:
        for line in f:
            line = line.rstrip()
            lst = re.split('s+', line)      #s+ 匹配空格1次或无限次
            for i in [0, 2, 4, 6]:
                codonTable[lst[i]] = lst[i + 1]
    
    rnaSeq = ''
    with open('rosalind_prot.txt', 'rt') as f:
        for line in f:
            line = line.rstrip()
            rnaSeq += line.upper()
    
    aminoAcids = []
    i = 0
    while i < len(rnaSeq):
        codon = rnaSeq[i:i + 3]
        if codonTable[codon] != 'Stop':
            aminoAcids.append(codonTable[codon])
        i += 3
    
    peptide = ''.join(aminoAcids)
    
    print (peptide)
    方法二:

    def translate_rna(sequence):
        codonTable = {
        'AUA':'I', 'AUC':'I', 'AUU':'I', 'AUG':'M',
        'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACU':'T',
        'AAC':'N', 'AAU':'N', 'AAA':'K', 'AAG':'K',
        'AGC':'S', 'AGU':'S', 'AGA':'R', 'AGG':'R',
        'CUA':'L', 'CUC':'L', 'CUG':'L', 'CUU':'L',
        'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCU':'P',
        'CAC':'H', 'CAU':'H', 'CAA':'Q', 'CAG':'Q',
        'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGU':'R',
        'GUA':'V', 'GUC':'V', 'GUG':'V', 'GUU':'V',
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCU':'A',
        'GAC':'D', 'GAU':'D', 'GAA':'E', 'GAG':'E',
        'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGU':'G',
        'UCA':'S', 'UCC':'S', 'UCG':'S', 'UCU':'S',
        'UUC':'F', 'UUU':'F', 'UUA':'L', 'UUG':'L',
        'UAC':'Y', 'UAU':'Y', 'UAA':'', 'UAG':'',
        'UGC':'C', 'UGU':'C', 'UGA':'', 'UGG':'W',
        }
        proteinsequence = ''
        for n in range(0,len(sequence),3):
            if sequence[n:n+3] in codonTable.keys():
                proteinsequence += codonTable[sequence[n:n+3]]
        return proteinsequence
     
    se = open('rosalind_prot.txt').read().strip('
    ') #sequence
    

         方法三:

    from Bio.Seq import Seq
    from Bio.Alphabet import generic_dna, generic_rna
    
    # translation
    messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", generic_rna)
    messenger_rna.translate()
    
    # reverse complement
    my_dna = Seq("AGTACACTGGT", generic_dna)
    my_dna.reverse_complement()
    

      

  • 相关阅读:
    MIME 部分扩展名与类型对应
    sql server 表变量、表类型、临时表
    SqlBulkCopy使用注意事项
    SQL Server为啥使用了这么多内存?
    SQL SERVER下有序GUID和无序GUID作为主键&聚集索引的性能表现
    DQL、DML、DDL、DCL的概念与区别
    IIS解决CPU和内存占用率过高的问题
    SQL Server 表变量和临时表的区别
    I Count Two Three(打表+排序+二分查找)
    AC自动机入门经典题目(两种表达方式)
  • 原文地址:https://www.cnblogs.com/think-and-do/p/7272590.html
Copyright © 2011-2022 走看看