zoukankan      html  css  js  c++  java
  • genBlastA

    BLAST is an extensively used local similarity search tool for identifying homologous sequences. When a gene sequence (either protein sequence or nucleotide sequence) is used as a query to search for homologous sequences in a genome, the search results, represented as a list of high-scoring pairs (HSPs), are fragments of candidate genes rather than full-length candidate genes. Relevant HSPs ("signals"), which represent candidate genes in the target genome sequences, are buried within a report that contains also hundreds to thousands of random HSPs ("noises"). Consequently, BLAST results are often overwhelming and confusing even to experienced users. For effective use of BLAST, a program is needed for extracting relevant HSPs that represent candidate homologous genes from the entire HSP report. To achieve this goal, we have designed a graph-based algorithm, genBlastA, which automatically filters HSPs into well-defined groups, each representing a candidate gene in the target genome. The novelty of genBlastA is an edge length metric that reflects a set of biologically motivated requirements so that each shortest path corresponds to an HSP group representing a homologous gene. We have demonstrated that this novel algorithm is both efficient and accurate for identifying homologous sequences, and that it outperforms existing approaches with similar functionalities. 

    做完tblastn之后,output是很多fragment represent sequence,与fragment represent sequence对应的gene便是candidate gene,这些fragment represent sequence收在一个report中(就是all-opsin.pep.gba.report这个report),这个report中有相关的HSP(也就是高分序列)和random HSP(随机产生,但是被tblastn program认为是HSP的序列,这些错误序列就是noise),genblasta就是将这些noise filter 的 tool。


    genBlastA release v1.0.1

    SYNOPSIS:
    Given a list of query protein or DNA sequences and a target database that
    consists of DNA sequences, this program runs wu-blast tblastn on the list
    of sequences provided, then for each query, it groups the resulted HSPs
    into sensible groups so that each group of HSPs corresponds to a potential
    target gene that is homologous to the query. The output is ranked according
    to their homology to the query.

    Command line options:
    -P Search program used to produce blast-format sequence alignments,
    can be either "blast" or "wublast", default is "blast",
    optional
    -q List of query sequences to blast, must be in fasta format,
    required
    -t The target database of genomic sequences in fasta format,
    required
    -p Whether query sequences are protein sequences (T/F)
    [default: T], optional
    -pg Specify which blast/wublast program to run. If not specified,
    the default behaviour is to run tblastn (for blast/wublast protein
    sequence) / blastn (for blast nucleotide sequence) or tblastx
    (for wublast nucleotide sequence).
    -e parameter for blast: The e-value, [default: 1e-2],
    optional
    -g parameter for blast: Perform gapped alignment (T/F)
    [default: T], optional
    -f parameter for blast: Perform filtering (T/F) [default: F],
    optional
    -a parameter for genBlast: weight of penalty for skipping HSPs,
    between 0 and 1 [default: 0.5], optional
    -d parameter for genBlast: maximum allowed distance between HSPs
    within the same gene, a non-negative integer [default: 100000],
    optional
    -r parameter for genBlast: number of ranks in the output,
    a positive integer, optional
    -c parameter for genBlast: minimum percentage of query gene
    coverage in the output, between 0 and 1 (e.g. for 50%
    gene coverage, use "0.5"), optional
    -s parameter for genBlast: minimum score of the HSP group in
    the output, a real number, optional
    -o output filename, optional. If not specified, the output
    will be the same as the query filename with ".gblast"
    extension.

    Example:
    genblasta -P blast -pg tblastn -q myquery -t mytarget -p T -e 1e-2 -g T -f F -a 0.5 -d 100000 -r 10 -c 0.5 -s 0 -o myoutput

    (Rong She (rshe@cs.sfu.ca) May 2010)

  • 相关阅读:
    java 排序
    spring 收藏博文
    转载:一位软件工程师的6年总结
    网站
    jdk配置环境变量的方法
    推荐桌游
    js 猜数字游戏
    html简易计算器的前端代码
    (转载)float与double中的精度问题
    jiaxiang
  • 原文地址:https://www.cnblogs.com/yuanjingnan/p/12426133.html
Copyright © 2011-2022 走看看