zoukankan      html  css  js  c++  java
  • 笔记 Bioinformatics Algorithms Chapter2

    Chapter2 WHICH DNA PATTERNS PLAY THE ROLE OF MOLECULAR CLOCKS 

    寻找模序 

    一、

    转录因子会结合基因上游的特定序列,调控基因的转录表达,但是在不同个体中,这个序列会有一些差别。本章讲述用贪婪、随机算法来寻找这个序列:寻找模序。

    (NF-κB binding sites from the Drosophila melanogaster genome)

    二、一些概念:

    1. Score、Profile 的含义如图

    根据profile matrix 可以计算出某个kmer在某一profile下的概率

    三、

    提出问题:Motif Finding Problem:

    Given a collection of strings, find a set of k-mers, one from each string, that minimizes the score of the resulting motif.

    Input: A collection of strings Dna and an integer k.

    Output: A collection Motifs of k-mers, one from each string in Dna, minimizing Score(Motifs) among all possible choices of k-mers.

    一组序列中,寻找一组k-mer,它们的Score是最低的(或者与consensus sequence的海明距离之和最小)

    1 遍历

    MedianString(Dna, k)
            distance ← ∞
            for each k-mer Pattern from AA…AA to TT…TT
                if distance > d(Pattern, Dna)
                     distance ← d(Pattern, Dna)
                     Median ← Pattern
            return Median

    2 贪婪法 GreedyMotifSearch

    GREEDYMOTIFSEARCH(Dna, k, t)
            BestMotifs ← motif matrix formed by first k-mers in each string
                          from Dna
            for each k-mer Motif in the first string from Dna
                Motif1 ← Motif
                for i = 2 to t
                    form Profile from motifs Motif1, …, Motifi - 1
                    Motifi ← Profile-most probable k-mer in the i-th string
                              in Dna
                Motifs ← (Motif1, …, Motift)
                if Score(Motifs) < Score(BestMotifs)
                    BestMotifs ← Motifs
            output BestMotifs

    详解 http://www.mrgraeme.co.uk/greedy-motif-search/

    *贪婪法 GreedyMotifSearch with pseudocounts

    pseudocounts:在形成profile matrix时,给0项设为一个较小的值

    GreedyMotifSearch(Dna, k, t)
            form a set of k-mers BestMotifs by selecting 1st k-mers in each string from Dna
            for each k-mer Motif in the first string from Dna
                Motif1 ← Motif
                for i = 2 to t
                    apply Laplace's Rule of Succession to form Profile from motifs   Motif1, …, Motifi-1
                    Motifi ← Profile-most probable k-mer in the i-th string in Dna
                Motifs ← (Motif1, …, Motift)
                if Score(Motifs) < Score(BestMotifs)
                    BestMotifs ← Motifs
            output BestMotifs

    3. 随机法Randomized Motif Search

    RandomizedMotifSearch(Dna, k, t)
         #随机从每个DNA取k-mer,生成一组motifs randomly select k-mers Motifs = (Motif1, …, Motift) in each string from Dna BestMotifs ← Motifs while forever Profile ← Profile(Motifs)#根据motifs形成Profile矩阵 Motifs ← Motifs(Profile, Dna) #根据profile矩阵从一组DNA生成一组几率最大的motifs if Score(Motifs) < Score(BestMotifs) BestMotifs ← Motifs else return BestMotifs

    随机算法起到作用的原因是,随机选取的一组Motifs,有可能选到潜在正确的一个k-mer,那么就在这中形成倾斜,直至寻找到较优解

    改进: 上一个算法,每次迭代都重新随机生成一组新的Motifs,这可能把潜在的正确模序抛弃了,改进的方法是每次随机只更改一行k-mer

    GibbsSampler(Dna, k, t, N)
            randomly select k-mers Motifs = (Motif1, …, Motift) in each string from Dna
            BestMotifs ← Motifs
            for j ← 1 to N
                i ← Random(t)
                Profile ← profile matrix constructed from all strings in Motifs except for Motif[i]
                Motif[i] ← Profile-randomly generated k-mer in the i-th sequence
                if Score(Motifs) < Score(BestMotifs)
                    BestMotifs ← Motifs
            return BestMotifs
  • 相关阅读:
    231. Power of Two
    204. Count Primes
    205. Isomorphic Strings
    203. Remove Linked List Elements
    179. Largest Number
    922. Sort Array By Parity II
    350. Intersection of Two Arrays II
    242. Valid Anagram
    164. Maximum Gap
    147. Insertion Sort List
  • 原文地址:https://www.cnblogs.com/lokwongho/p/9682667.html
Copyright © 2011-2022 走看看