zoukankan      html  css  js  c++  java
  • 【Workflows】 WGS/WES Mapping to Variant Calls


    WGS/WES Mapping to Variant Calls - Version 1.0

    htslib官网上给的一个WGS/WES的流程。关于htslib、samtools和bcftools之间的关系,可以在sanger官网查看其解释:

    HTSlib is a software library for manipulating various sequencing and variant file formats: SAM, BAM, CRAM, VCF, and BCF. SAMtools and BCFtools are applications built around HTSlib, performing format conversion, file merging and splitting, sorting, variant calling, and much more.

    workflow主要三步骤:

    • Mapping
    • Improvement
    • Variant Calling

    Mapping

    bwa index <ref.fa>
    bwa mem -R '@RG	ID:foo	SM:bar	LB:library1' <ref.fa> <read1.fa> <read1.fa> > lane.sam  #官方给的,我认为是read1.fq和read2.fq
    samtools fixmate -O bam <lane.sam> <lane_fixmate.bam>
    samtools sort -O bam -o <lane_sorted.bam> -T </tmp/lane_temp> <lane_fixmate.sam>
    

    Improvement

    # realign gapped alignment
    java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R <ref.fa> -I <lane.bam> -o <lane.intervals> --known <bundle/b38/Mills1000G.b38.vcf>
    java -Xmx4g -jar GenomeAnalysisTK.jar -T IndelRealigner -R <ref.fa> -I <lane.bam> -targetIntervals <lane.intervals> --known <bundle/b38/Mills1000G.b38.vcf> -o <lane_realigned.bam>
    
    # BQSR
    ava -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R <ref.fa> -knownSites >bundle/b38/dbsnp_142.b38.vcf> -I <lane.bam> -o <lane_recal.table>
    java -Xmx2g -jar GenomeAnalysisTK.jar -T PrintReads -R <ref.fa> -I <lane.bam> --BSQR <lane_recal.table> -o <lane_recal.bam>
    
    #MarkDuplicates
    java -Xmx2g -jar MarkDuplicates.jar VALIDATION_STRINGENCY=LENIENT INPUT=<lane_1.bam> INPUT=<lane_2.bam> INPUT=<lane_3.bam> OUTPUT=<library.bam>
    
    samtools merge <sample.bam> <library1.bam> <library2.bam> <library3.bam>
    samtools index <sample.bam>
    
    # realign your INDELS(可选)
    java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R <ref.fa> -I <sample.bam> -o <sample.intervals> --known >bundle/b38/Mills1000G.b38.vcf>
    java -Xmx4g -jar GenomeAnalysisTK.jar -T IndelRealigner -R <ref.fa> -I <sample.bam> -targetIntervals <sample.intervals> --known >bundle/b38/Mills1000G.b38.vcf> -o <sample_realigned.bam>
    
    samtools index <sample_realigned.bam>
    

    Variant Calling

    bcftools mpileup -Ou -f <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam> | bcftools call -vmO z -o <study.vcf.gz>
    
    # exam bcf(可选)
    bcftools mpileup -Ob -o <study.bcf> -f <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam>
    bcftools call -vmO z -o <study.vcf.gz> <study.bcf>
    
    tabix -p vcf <study.vcf.gz>
    
    bcftools stats -F <ref.fa> -s - <study.vcf.gz> > <study.vcf.gz.stats>
    mkdir plots
    plot-vcfstats -p plots/ <study.vcf.gz.stats>
    
    bcftools filter -O z -o <study_filtered..vcf.gz> -s LOWQUAL -i'%QUAL>10' <study.vcf.gz>
    
  • 相关阅读:
    人工智能搜索算法(深度优先、迭代加深、一致代价、A*搜索)
    四.redis 事务
    三.redis 排序
    二.redis 数据类型
    一.redis 环境搭建
    Redis几个认识误区
    key-list类型内存数据引擎介绍及使用场景
    牛人推荐机器学习网站
    Android Studio 快捷键整理分享
    人工智能和机器学习领域中有趣的开源项目
  • 原文地址:https://www.cnblogs.com/jessepeng/p/12579674.html
Copyright © 2011-2022 走看看