zoukankan      html  css  js  c++  java
  • samtools 工具

    软件地址:

    http://www.htslib.org/
    

     功能三大版块 :

    Samtools
    Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
    BCFtools
    Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
    HTSlib
    A C library for reading/writing high-throughput sequencing data
    cd samtools-1.x # and similarly for bcftools and htslib
    ./configure --prefix=/where/to/install
    make
    make install
    export PATH=/where/to/install/bin:$PATH # for sh or bash users

    Mapping

    To prepare the reference for mapping you must first index   ( BWT:Burrows Wheeler Transform 索引算法,耗时较长,安排好时间)

    bwa index <ref.fa>
    bwa mem -R '@RG	ID:foo	SM:bar	LB:library1' <ref.fa> <read1.fa> <read1.fa> > lane.sam
    

    Ensure that the @RG information here is correct as this information is used by later tools. The SM field must be set to the name of the sample being processed, and LB field to the library.

    samtools fixmate -O bam <lane.sam> <lane_fixmate.bam>
    
    samtools sort -O bam -o <lane_sorted.bam> -T </tmp/lane_temp> <lane_fixmate.sam>
    

     In order to reduce the number of miscalls of INDELs in your data it is helpful to realign your raw gapped alignment with the Broad’s GATK Realigner.

    java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R <ref.fa> -I <lane.bam> -o <lane.intervals> --known <bundle/b38/Mills1000G.b38.vcf>
    java -Xmx4g -jar GenomeAnalysisTK.jar -T IndelRealigner -R <ref.fa> -I <lane.bam> -targetIntervals <lane.intervals> --known <bundle/b38/Mills1000G.b38.vcf> -o <lane_realigned.bam>
    

     BQSR from the Broad’s GATK allows you to reduce the effects of analysis artefacts produced by your sequencing machines. It does this in two steps, the first analyses your data to detect covariates and the second compensates for those covariates by adjusting quality scores.

    java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R <ref.fa> -knownSites >bundle/b38/dbsnp_142.b38.vcf> -I <lane.bam> -o <lane_recal.table>
    java -Xmx2g -jar GenomeAnalysisTK.jar -T PrintReads -R <ref.fa> -I <lane.bam> --BSQR <lane_recal.table> -o <lane_recal.bam>
    

     It is helpful at this point to compile all of the reads from each library together into one BAM, which can be done at the same time as marking PCR and optical duplicates. To identify duplicates we currently recommend the use of either the Picard or biobambam’s mark duplicates tool.

    java -Xmx2g -jar MarkDuplicates.jar VALIDATION_STRINGENCY=LENIENT INPUT=<lane_1.bam> INPUT=<lane_2.bam> INPUT=<lane_3.bam> OUTPUT=<library.bam>
    

     Once this is done you can perform another merge step to produce your sample BAM files.

    samtools merge <sample.bam> <library1.bam> <library2.bam> <library3.bam>
    samtools index <sample.bam>
    

    Variant Calling

    To convert your BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome.

    bcftools mpileup -Ou -f <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam> | bcftools call -vmO z -o <study.vcf.gz>
    

     To prepare our VCF for querying we next index it using tabix:

    tabix -p vcf <study.vcf.gz>
    

     prepare graphs and statistics to assist you in filtering your variants:

    bcftools stats -F <ref.fa> -s - <study.vcf.gz> > <study.vcf.gz.stats>
    mkdir plots
    plot-vcfstats -p plots/ <study.vcf.gz.stats>
    

     filter data

    bcftools filter -O z -o <study_filtered..vcf.gz> -s LOWQUAL -i'%QUAL>10' <study.vcf.gz>
    

     Variant filtration is a subject worthy of an article in itself and the exact filters you will need to use will depend on the purpose of your study and quality and depth of the data used to call the variants.

    注: 变异过滤要服从具体研究目的以及数据的质量和深度。

  • 相关阅读:
    Android Shape画圆,矩形
    Android 图片平铺效果实现的3种方法
    threadid=1: thread exiting with uncaught exception (group=0x40db8930)
    Facebook 调试工具Stetho配置入门
    Exception in MessageQueue callback: handleReceiveCallback
    EditText 双击才能获取点击事件
    2011年中国(大陆)地级以上(含省直辖县)行政区划表
    Android 应用接入广点通统计API 方案
    Android常用工具类封装---SharedPreferencesUtil
    IIS上发布站点后URL重写失效的解决方法
  • 原文地址:https://www.cnblogs.com/jinhh/p/7920292.html
Copyright © 2011-2022 走看看