zoukankan      html  css  js  c++  java
  • genBlastA

    BLAST is an extensively used local similarity search tool for identifying homologous sequences. When a gene sequence (either protein sequence or nucleotide sequence) is used as a query to search for homologous sequences in a genome, the search results, represented as a list of high-scoring pairs (HSPs), are fragments of candidate genes rather than full-length candidate genes. Relevant HSPs ("signals"), which represent candidate genes in the target genome sequences, are buried within a report that contains also hundreds to thousands of random HSPs ("noises"). Consequently, BLAST results are often overwhelming and confusing even to experienced users. For effective use of BLAST, a program is needed for extracting relevant HSPs that represent candidate homologous genes from the entire HSP report. To achieve this goal, we have designed a graph-based algorithm, genBlastA, which automatically filters HSPs into well-defined groups, each representing a candidate gene in the target genome. The novelty of genBlastA is an edge length metric that reflects a set of biologically motivated requirements so that each shortest path corresponds to an HSP group representing a homologous gene. We have demonstrated that this novel algorithm is both efficient and accurate for identifying homologous sequences, and that it outperforms existing approaches with similar functionalities. 

    做完tblastn之后,output是很多fragment represent sequence,与fragment represent sequence对应的gene便是candidate gene,这些fragment represent sequence收在一个report中(就是all-opsin.pep.gba.report这个report),这个report中有相关的HSP(也就是高分序列)和random HSP(随机产生,但是被tblastn program认为是HSP的序列,这些错误序列就是noise),genblasta就是将这些noise filter 的 tool。


    genBlastA release v1.0.1

    SYNOPSIS:
    Given a list of query protein or DNA sequences and a target database that
    consists of DNA sequences, this program runs wu-blast tblastn on the list
    of sequences provided, then for each query, it groups the resulted HSPs
    into sensible groups so that each group of HSPs corresponds to a potential
    target gene that is homologous to the query. The output is ranked according
    to their homology to the query.

    Command line options:
    -P Search program used to produce blast-format sequence alignments,
    can be either "blast" or "wublast", default is "blast",
    optional
    -q List of query sequences to blast, must be in fasta format,
    required
    -t The target database of genomic sequences in fasta format,
    required
    -p Whether query sequences are protein sequences (T/F)
    [default: T], optional
    -pg Specify which blast/wublast program to run. If not specified,
    the default behaviour is to run tblastn (for blast/wublast protein
    sequence) / blastn (for blast nucleotide sequence) or tblastx
    (for wublast nucleotide sequence).
    -e parameter for blast: The e-value, [default: 1e-2],
    optional
    -g parameter for blast: Perform gapped alignment (T/F)
    [default: T], optional
    -f parameter for blast: Perform filtering (T/F) [default: F],
    optional
    -a parameter for genBlast: weight of penalty for skipping HSPs,
    between 0 and 1 [default: 0.5], optional
    -d parameter for genBlast: maximum allowed distance between HSPs
    within the same gene, a non-negative integer [default: 100000],
    optional
    -r parameter for genBlast: number of ranks in the output,
    a positive integer, optional
    -c parameter for genBlast: minimum percentage of query gene
    coverage in the output, between 0 and 1 (e.g. for 50%
    gene coverage, use "0.5"), optional
    -s parameter for genBlast: minimum score of the HSP group in
    the output, a real number, optional
    -o output filename, optional. If not specified, the output
    will be the same as the query filename with ".gblast"
    extension.

    Example:
    genblasta -P blast -pg tblastn -q myquery -t mytarget -p T -e 1e-2 -g T -f F -a 0.5 -d 100000 -r 10 -c 0.5 -s 0 -o myoutput

    (Rong She (rshe@cs.sfu.ca) May 2010)

  • 相关阅读:
    Windows7防火墙服务无法启用怎么办
    asp.net实现md5加密方法详解
    php去除换行(回车换行)的方法
    MySQL函数大全
    php codebase生成随机数
    Tensorflow 的Word2vec demo解析
    深度学习课程部分资料整理
    稀疏矩阵表示
    Faster-rnnlm代码分析3
    Faster-rnnlm代码分析2
  • 原文地址:https://www.cnblogs.com/yuanjingnan/p/12426133.html
Copyright © 2011-2022 走看看