zoukankan      html  css  js  c++  java
  • 统计测序深度和覆盖度的工具

    1.GATK DepthOfCoverage

    cat test.sh 

    #!/usr/bin/bash HG38=/path/hg38/hg38.fa GATK=/path/biosoft/gatk3.7/GenomeAnalysisTK.jar java -jar -Xmx10g $GATK -T DepthOfCoverage -R $HG38 -o ./test -I /path2bam/03_bam_processing/03_base_recal/test.sorted_MarkDuplicates_BQSR.bam -L /path2bed/target.bed --omitDepthOutputAtEachBase --omitIntervalStatistics -ct 1 -ct 5 -ct 10 -ct 20 -ct 30 -ct 50 -ct 100

    结果生成四个文件

     4096 Jan 13 21:39 ./
     4096 Jan 13 21:05 ../
     6417 Jan 13 21:39 test.sample_cumulative_coverage_counts
     GenekVIP 6412 Jan 13 21:39 test.sample_cumulative_coverage_proportions
     9362 Jan 13 21:39 test.sample_statistics
     291 Jan 13 21:39 test.sample_summary
    

     less test.sample_sumary 结果不是很好

    sample_id       total   mean    granular_third_quartile granular_median granular_first_quartile %_bases_above_1 %_bases_above_5 %_bases_above_10        %_bases_above_20        %_bases_above_3
    test    2025198 17.50   1       1       1       5.1     2.7     2.6     2.5     2.5     2.4     2.3
    Total   2025198 17.50   N/A     N/A     N/A

    #

    2.bamdst: bam文件深度统计

    #

    软件安装:

    git clone https://github.com/shiquan/bamdst.git
    cd bamdst/
    make

    测试:

    cat  test.sh

    #!/usr/bin/bash
    /path2bamdst/biosoft/bamdst/bamdst -p /path2bed/target.bed -o ./ ../03_base_recal/test.sorted.markdup.realign.BQSR.bam echo "run status $?"

    结果文件:

       1630 Jan 13 16:58 chromosomes.report
       4331 Jan 13 16:58 coverage.report
      64188 Jan 13 16:58 depth_distribution.plot
     838584 Jan 13 16:58 depth.tsv.gz
      18119 Jan 13 16:58 insertsize.plot
       9519 Jan 13 16:58 region.tsv.gz
         0  Jan 13 16:58 uncover.bed

    看一下内容:

    $ less -S chromosomes.report#该文件中存储的是从bam文件中获取的染色体深度、覆盖度信息
    Chromosome        DATA(%)       Avg depth          Median       Coverage%        Cov 4x %       Cov 10x %       Cov 30x %
    chrM       100.00            0.26              0.0           5.66            2.83            0.00            0.00
    $ less coverage.report#信息太多了,我目前觉得比较重要的有
    [Total] Raw Reads (All reads)    15
    [Total] Mapped Reads    15
    [Total] Properly paired    0#Paired reads with properly insert size
    [Total] Fraction of Properly paired    0.00%
    [Total] Read and mate paired    0#read1 and read2 paired.
    [Total] Fraction of Read and mate paired    0.00%
    [Total] Map quality cutoff value    20
    [Total] MapQuality above cutoff reads    10
    [Total] Fraction of MapQ reads in all reads    66.67%
    [Total] Fraction of MapQ reads in mapped reads    66.67%
    [Target] Average depth    0.26
    [Target] Average depth(rmdup)    0.06
    [Target] Coverage (>0x)    5.66%
    [Target] Coverage (>=4x)    2.83%
    [Target] Coverage (>=10x)    0.00%
    [Target] Coverage (>=30x)    0.00%
    $ less depth_distribution.plot#结合R可以做出目标区域的深度分布图
    0       900     0.943396        54      0.056604
    1       0       0.000000        54      0.056604
    2       0       0.000000        54      0.056604
    3       27      0.028302        27      0.028302
    4       4       0.004193        23      0.024109
    #左边三列分别代表:覆盖深度,对应该深度的碱基数,碱基占比(以目标区域的碱基数为分母);
    #右边两列是高深度向低深度累加的值,分别是碱基数及其占比,累加到X=1为止。
    $ less depth.tsv
    #Chr    Pos     Raw Depth       Rmdup depth     Cover depth
    chrM    650     8       6       8
    chrM    651     8       6       8
    chrM    652     8       6       8
    chrM    653     9       6       9
    chrM    654     9       6       9
    #Raw Depth从输入bam文件中得到,没有任何限制;
    #Rmdup depth去除了PCR重复的reads, 次优比对reads, 低比对质量的reads(mapQ < 20), 该值与samtools depth的输出深度类似;
    #默认使用raw depth来统计coverage.report文件中的覆盖率信息。 如果要使用rmdup depth计算覆盖率,可使用参数"--use_rmdup";
    #The coverage depth is the raw depth with consider of deletion region, so this value should be equal to or greated than the raw depth.
    $ less insertsize.plot#做出两个接头之间的插入片段大小的图,理论上是根据read1和read2在染色体上的坐标信息求得,这时read1和read2要求比对到一条染色体上。
    $ less region.tsv
    #Chr    Start   Stop    Avg depth       Median  Coverage        Coverage(FIX)
    chrM    649     1603    0.26    0.0     5.66    5.66
    #目标区域文件每一个区域的统计,其中Coverage以X>0来统计
    $ less uncover.bed
    chrM    672     1603
    #--uncover的值默认是5,从前面depth_distribution.plot文件中也能看出来,深度小于5的碱基数就是931;
    #包含低覆盖或者是未覆盖的,通过"--uncover"规定“未覆盖”的阈值。
    

    REFERENCE   简书作者:hsy_hzauer 链接:https://www.jianshu.com/p/ac7b8e4d76e4

  • 相关阅读:
    浅谈Java中的栈和堆
    Java运行时内存划分
    浅谈Static
    浅谈同一家公司多个系统,共用登录用户名和密码
    浅谈Final
    浅谈StringBuffer
    浅谈加密算法BCrypt
    序列表 批量的含义
    安装activemq和java代码实现生产和消费
    Restful
  • 原文地址:https://www.cnblogs.com/yellow-hgy/p/10264275.html
Copyright © 2011-2022 走看看