zoukankan      html  css  js  c++  java
  • GenomicConsensus (quiver, arrow)使用方法 | 序列 consensus

     https://github.com/PacificBiosciences/GenomicConsensus

    GenomicConsensus 是pacbio开发的,我个人非常不喜欢pacbio开发的工具,很难用。

    安装这个GenomicConsensus也是废了我快半条老命。

    这个工具的目的:Compute genomic consensus and call variants relative to the reference.

    就是用一些reads来对最终的ref来进行纠错,这个模型适用性比较大,可以用在各个场合,尤其是我们在开发一些工具时,可以直接将这个嵌入到我们的工具中,减少开发量。

    ./bin/arrow -h
    usage: variantCaller [-h] [--version] [--emit-tool-contract]
                         [--resolved-tool-contract RESOLVED_TOOL_CONTRACT]
                         [--log-file LOG_FILE]
                         [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]
                         --referenceFilename REFERENCEFILENAME -o OUTPUTFILENAMES
                         [-j NUMWORKERS] [--minConfidence MINCONFIDENCE]
                         [--minCoverage MINCOVERAGE]
                         [--noEvidenceConsensusCall {nocall,reference,lowercasereference}]
                         [--coverage COVERAGE] [--minMapQV MINMAPQV]
                         [--referenceWindow REFERENCEWINDOWSASSTRING]
                         [--alignmentSetRefWindows]
                         [--referenceWindowsFile REFERENCEWINDOWSASSTRING]
                         [--barcode _BARCODE] [--readStratum READSTRATUM]
                         [--minReadScore MINREADSCORE] [--minSnr MINHQREGIONSNR]
                         [--minZScore MINZSCORE] [--minAccuracy MINACCURACY]
                         [--algorithm {quiver,arrow,plurality,poa,best}]
                         [--parametersFile PARAMETERSFILE]
                         [--parametersSpec PARAMETERSSPEC]
                         [--maskRadius MASKRADIUS] [--maskErrorRate MASKERRORRATE]
                         [--pdb] [--notrace] [--pdbAtStartup] [--profile]
                         [--dumpEvidence [{variants,all,outliers}]]
                         [--evidenceDirectory EVIDENCEDIRECTORY] [--annotateGFF]
                         [--reportEffectiveCoverage] [--diploid]
                         [--queueSize QUEUESIZE] [--threaded]
                         [--referenceChunkSize REFERENCECHUNKSIZE]
                         [--fancyChunking] [--simpleChunking]
                         [--referenceChunkOverlap REFERENCECHUNKOVERLAP]
                         [--autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE]
                         [--aligner {affine,simple}] [--refineDinucleotideRepeats]
                         [--noRefineDinucleotideRepeats] [--fast]
                         [--skipUnrecognizedContigs]
                         inputFilename
    
    Compute genomic consensus and call variants relative to the reference.
    
    optional arguments:
      -h, --help            show this help message and exit
      --version             show program's version number and exit
      --emit-tool-contract  Emit Tool Contract to stdout (default: False)
      --resolved-tool-contract RESOLVED_TOOL_CONTRACT
                            Run Tool directly from a PacBio Resolved tool contract
                            (default: None)
      --log-file LOG_FILE   Write the log to file. Default(None) will write to
                            stdout. (default: None)
      --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                            Set log level (default: WARN)
      --debug               Alias for setting log level to DEBUG (default: False)
      --quiet               Alias for setting log level to CRITICAL to suppress
                            output. (default: False)
      -v, --verbose         Set the verbosity level. (default: None)
    
    Basic required options:
      inputFilename         The input cmp.h5 or BAM alignment file
      --referenceFilename REFERENCEFILENAME, --reference REFERENCEFILENAME, -r REFERENCEFILENAME
                            The filename of the reference FASTA file (default:
                            None)
      -o OUTPUTFILENAMES, --outputFilename OUTPUTFILENAMES
                            The output filename(s), as a comma-separated
                            list.Valid output formats are .fa/.fasta, .fq/.fastq,
                            .gff, .vcf (default: [])
    
    Parallelism:
      -j NUMWORKERS, --numWorkers NUMWORKERS
                            The number of worker processes to be used (default: 1)
    
    Output filtering:
      --minConfidence MINCONFIDENCE, -q MINCONFIDENCE
                            The minimum confidence for a variant call to be output
                            to variants.{gff,vcf} (default: 40)
      --minCoverage MINCOVERAGE, -x MINCOVERAGE
                            The minimum site coverage that must be achieved for
                            variant calls and consensus to be calculated for a
                            site. (default: 5)
      --noEvidenceConsensusCall {nocall,reference,lowercasereference}
                            The consensus base that will be output for sites with
                            no effective coverage. (default: lowercasereference)
    
    Read selection/filtering:
      --coverage COVERAGE, -X COVERAGE
                            A designation of the maximum coverage level to be used
                            for analysis. Exact interpretation is algorithm-
                            specific. (default: 100)
      --minMapQV MINMAPQV, -m MINMAPQV
                            The minimum MapQV for reads that will be used for
                            analysis. (default: 10)
      --referenceWindow REFERENCEWINDOWSASSTRING, --referenceWindows REFERENCEWINDOWSASSTRING, -w REFERENCEWINDOWSASSTRING
                            The window (or multiple comma-delimited windows) of
                            the reference to be processed, in the format refGroup
                            :refStart-refEnd (default: entire reference).
                            (default: None)
      --alignmentSetRefWindows
                            The window (or multiple comma-delimited windows) of
                            the reference to be processed, in the format refGroup
                            :refStart-refEnd will be pulled from the alignment
                            file. (default: False)
      --referenceWindowsFile REFERENCEWINDOWSASSTRING, -W REFERENCEWINDOWSASSTRING
                            A file containing reference window designations, one
                            per line (default: None)
      --barcode _BARCODE    Only process reads with the given barcode name.
                            (default: None)
      --readStratum READSTRATUM
                            A string of the form 'n/N', where n, and N are
                            integers, 0 <= n < N, designating that the reads are
                            to be deterministically split into N strata of roughly
                            even size, and stratum n is to be used for variant and
                            consensus calling. This is mostly useful for Quiver
                            development. (default: None)
      --minReadScore MINREADSCORE
                            The minimum ReadScore for reads that will be used for
                            analysis (arrow-only). (default: 0.65)
      --minSnr MINHQREGIONSNR
                            The minimum acceptable signal-to-noise over all
                            channels for reads that will be used for analysis
                            (arrow-only). (default: 3.75)
      --minZScore MINZSCORE
                            The minimum acceptable z-score for reads that will be
                            used for analysis (arrow-only). (default: -3.5)
      --minAccuracy MINACCURACY
                            The minimum acceptable window-global alignment
                            accuracy for reads that will be used for the analysis
                            (arrow-only). (default: 0.82)
    
    Algorithm and parameter settings:
      --algorithm {quiver,arrow,plurality,poa,best}
      --parametersFile PARAMETERSFILE, -P PARAMETERSFILE
                            Parameter set filename (such as ArrowParameters.json
                            or QuiverParameters.ini), or directory D such that
                            either D/*/GenomicConsensus/QuiverParameters.ini, or
                            D/GenomicConsensus/QuiverParameters.ini, is found. In
                            the former case, the lexically largest path is chosen.
                            (default: None)
      --parametersSpec PARAMETERSSPEC, -p PARAMETERSSPEC
                            Name of parameter set (chemistry.model) to select from
                            the parameters file, or just the name of the
                            chemistry, in which case the best available model is
                            chosen. Default is 'auto', which selects the best
                            parameter set from the alignment data (default: auto)
      --maskRadius MASKRADIUS
                            Radius of window to use when excluding local regions
                            for exceeding maskMinErrorRate, where 0 disables any
                            filtering (arrow-only). (default: 3)
      --maskErrorRate MASKERRORRATE
                            Maximum local error rate before the local region
                            defined by maskRadius is excluded from polishing
                            (arrow-only). (default: 0.7)
    
    Verbosity and debugging/profiling:
      --pdb                 Enable Python debugger (default: False)
      --notrace             Suppress stacktrace for exceptions (to simplify
                            testing) (default: False)
      --pdbAtStartup        Drop into Python debugger at startup (requires ipdb)
                            (default: False)
      --profile             Enable Python-level profiling (using cProfile).
                            (default: False)
      --dumpEvidence [{variants,all,outliers}], -d [{variants,all,outliers}]
      --evidenceDirectory EVIDENCEDIRECTORY
      --annotateGFF         Augment GFF variant records with additional
                            information (default: False)
      --reportEffectiveCoverage
                            Additionally record the *post-filtering* coverage at
                            variant sites (default: False)
    
    Advanced configuration options:
      --diploid             Enable detection of heterozygous variants
                            (experimental) (default: False)
      --queueSize QUEUESIZE, -Q QUEUESIZE
      --threaded, -T        Run threads instead of processes (for debugging
                            purposes only) (default: False)
      --referenceChunkSize REFERENCECHUNKSIZE, -C REFERENCECHUNKSIZE
      --fancyChunking       Adaptive reference chunking designed to handle
                            coverage cutouts better (default: True)
      --simpleChunking      Disable adaptive reference chunking (default: True)
      --referenceChunkOverlap REFERENCECHUNKOVERLAP
      --autoDisableHdf5ChunkCache AUTODISABLEHDF5CHUNKCACHE
                            Disable the HDF5 chunk cache when the number of
                            datasets in the cmp.h5 exceeds the given threshold
                            (default: 500)
      --aligner {affine,simple}, -a {affine,simple}
                            The pairwise alignment algorithm that will be used to
                            produce variant calls from the consensus (Quiver
                            only). (default: affine)
      --refineDinucleotideRepeats
                            Require quiver maximum likelihood search to try one
                            less/more repeat copy in dinucleotide repeats, which
                            seem to be the most frequent cause of suboptimal
                            convergence (getting trapped in local optimum) (Quiver
                            only) (default: True)
      --noRefineDinucleotideRepeats
                            Disable dinucleotide refinement (default: True)
      --fast                Cut some corners to run faster. Unsupported! (default:
                            False)
      --skipUnrecognizedContigs
                            Do not abort when told to process a reference window
                            (via -w/--referenceWindow[s]) that has no aligned
                            coverage. Outputs emptyish files if there are no
                            remaining non-degenerate windows. Only intended for
                            use by smrtpipe scatter/gather. (default: False)
    

      

    待续~~

  • 相关阅读:
    JDK1.5新特性
    mysql的基本使用
    IO简单示例
    序列化
    策略模式
    div+css布局之流体浮动布局
    xp优化
    Junit所使用的设计模式
    SSH使用总结(annotation配置方式)
    hibernate3.6.0使用总结
  • 原文地址:https://www.cnblogs.com/leezx/p/8606734.html
Copyright © 2011-2022 走看看