zoukankan      html  css  js  c++  java
  • E-value identity bitscore

    E-value:

    The E-value provides information about the likelihood that a given sequence match is purely by chance. The lower the E-value, the less likely the database match is a result of random chance and therefore the more significant the match is.

    Empirical interpretation of the E-value is as follows:

    If E-value < 1e-50 (or 1 X 10-50), there should be an extremely high confidence that the database match is a result of homologous relationships.

    If E-value is between 0.01 and 1e-50, the match can be considered a result of homology.

    If E-value is between 10 and 0.01, the match is considered not significant, but may hint at a tentative remote homology relationship. Additional evidence is needed to confirm the tentative relationship.

    If E-value > 10, the sequences under consideration are either unralated or related by extremely distant realtionships that fall below the limit of detection with the current method.

    Because the E-value is proportionally affected by the database size, an obvious problem is that as the database grows, the E-value for a given sequence match also increases.

    Because the genuine evolutionary relationship beween the two sequence remains constant, the decrease in credibility of the sequence match as the database grows means that one may "lose" previously detected homologs as the database enlarges. Thus, an alternative to E-value calculations is needed.

    The E-value is very important, the lower the better

    bitscore:

    A bitscore is another prominant statistical indicator used in addition to the E-value in a BLAST output. The bitscore measures sequence similarity independent of query sequence length and database size and is normalized based on the raw pairwise alignment score. The bitscore (S) is determined by the following formula: S = (λ * S - lnK) / ln2  where λ is the Gumble distribution constant, S is the raw alignment score, and K is a constant associated with the scoring matrix used. Clearly, the bitscore (S) is linearly related to the raw alignment score (S). Thus, the higher the bit score, the more highly significant the match is. The bit score provides a constant statistical indicator for  searching different databases of different size or for searching the same database at different times as the database enlarges.

    identity:

    Identity 35% means that 35% of AA in your sequence match to other sequences in database, There isn't something like "acceptable percentage". It always depends on what you are looking for:

    If you have unkown protein sequence and you would like to know the homology sequences, information about identity (even 35%) is valuable.

    If you have known protein and you need to confirm the sequence, the identity 35% is small and may suggest that something went wrong during your analysis.

  • 相关阅读:
    dict
    list & tuple
    int & bool & string
    关于gcc内置的原子操作函数
    关于quicklz压缩算法在游戏中的应用
    为mingw生成mysql的客户端库文件
    linux下core生成与调试
    linux下GCC编译动态库切记加 -fPIC
    一则gvim命令
    WIN系统下网络莫名其妙怪异的无法可用时的处理方式
  • 原文地址:https://www.cnblogs.com/0820LL/p/11352294.html
Copyright © 2011-2022 走看看