zoukankan      html  css  js  c++  java
  • 08 Translating RNA into Protein

    Problem

    The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.

    The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.

    Given: An RNA string ss corresponding to a strand of mRNA (of length at most 10 kbp).

    Return: The protein string encoded by ss.

    Sample Dataset

    AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA
    

    Sample Output

    MAMAPRTEINSTRING

    方法一:
    # -*- coding: utf-8 -*-
    ### 8. Translating RNA into Protein ###
    import re
    from collections import OrderedDict
    
    codonTable = OrderedDict()
    with open('rna_codon_table.txt') as f:
        for line in f:
            line = line.rstrip()
            lst = re.split('s+', line)      #s+ 匹配空格1次或无限次
            for i in [0, 2, 4, 6]:
                codonTable[lst[i]] = lst[i + 1]
    
    rnaSeq = ''
    with open('rosalind_prot.txt', 'rt') as f:
        for line in f:
            line = line.rstrip()
            rnaSeq += line.upper()
    
    aminoAcids = []
    i = 0
    while i < len(rnaSeq):
        codon = rnaSeq[i:i + 3]
        if codonTable[codon] != 'Stop':
            aminoAcids.append(codonTable[codon])
        i += 3
    
    peptide = ''.join(aminoAcids)
    
    print (peptide)
    方法二:

    def translate_rna(sequence):
        codonTable = {
        'AUA':'I', 'AUC':'I', 'AUU':'I', 'AUG':'M',
        'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACU':'T',
        'AAC':'N', 'AAU':'N', 'AAA':'K', 'AAG':'K',
        'AGC':'S', 'AGU':'S', 'AGA':'R', 'AGG':'R',
        'CUA':'L', 'CUC':'L', 'CUG':'L', 'CUU':'L',
        'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCU':'P',
        'CAC':'H', 'CAU':'H', 'CAA':'Q', 'CAG':'Q',
        'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGU':'R',
        'GUA':'V', 'GUC':'V', 'GUG':'V', 'GUU':'V',
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCU':'A',
        'GAC':'D', 'GAU':'D', 'GAA':'E', 'GAG':'E',
        'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGU':'G',
        'UCA':'S', 'UCC':'S', 'UCG':'S', 'UCU':'S',
        'UUC':'F', 'UUU':'F', 'UUA':'L', 'UUG':'L',
        'UAC':'Y', 'UAU':'Y', 'UAA':'', 'UAG':'',
        'UGC':'C', 'UGU':'C', 'UGA':'', 'UGG':'W',
        }
        proteinsequence = ''
        for n in range(0,len(sequence),3):
            if sequence[n:n+3] in codonTable.keys():
                proteinsequence += codonTable[sequence[n:n+3]]
        return proteinsequence
     
    se = open('rosalind_prot.txt').read().strip('
    ') #sequence
    

         方法三:

    from Bio.Seq import Seq
    from Bio.Alphabet import generic_dna, generic_rna
    
    # translation
    messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", generic_rna)
    messenger_rna.translate()
    
    # reverse complement
    my_dna = Seq("AGTACACTGGT", generic_dna)
    my_dna.reverse_complement()
    

      

  • 相关阅读:
    【bzoj4372】烁烁的游戏 动态点分治+线段树
    【bzoj3730】震波 动态点分治+线段树
    【bzoj3125】CITY 插头dp
    【bzoj2310】ParkII 插头dp
    【bzoj1187】[HNOI2007]神奇游乐园 插头dp
    【bzoj1814】Ural 1519 Formula 1 插头dp
    【loj2325】「清华集训 2017」小Y和恐怖的奴隶主 概率dp+倍增+矩阵乘法
    【bzoj3518】点组计数 欧拉函数(欧拉反演)
    【bzoj5099】[POI2018]Pionek 双指针法
    【bzoj4311】向量 线段树对时间分治+STL-vector维护凸包
  • 原文地址:https://www.cnblogs.com/think-and-do/p/7272590.html
Copyright © 2011-2022 走看看