zoukankan      html  css  js  c++  java
  • Variation calling and annotation

    Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean

    本文摘自《Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean》

    Variation calling and annotation.

    Mapping.

    SAMtools (Version: 0.1.18) software was used to convert mapping results into the BAM format and to filter the unmapped and non-unique reads.

    Duplicated reads were filtered with the Picard package (picard.sourceforge.net, Version:1.87).

    The BEDtools (Version: 2.17.0) coverageBed program was used to compute the coverage of sequence alignments. (A sequence was defined as absent if coverage was lower than 90% and present if coverage was greater than 90%.)

    SNP calling.

    SNP detection was performed using the Genome Analysis Toolkit (GATK, version 2.4-7-g5e89f01) and SAMtools. Only the SNPs detected by both methods were analyzed further.
    The detailed processes were as follows:
    (1) After BWA alignment, the reads around indels were realigned.
    Realignment was performed with GATK in two steps.
    The first step used the RealignerTargetCreator package to identify regions where realignment was needed;
    The second step used IndelRealigner to realign the regions found in the first step, which produced a realigned BAM file for each accession.
    (2) SNPs were called at a population level with GATK and SAMtools. For GATK, the SNP confidence score was set as greater than 30, and the parameter -stand_call_conf was set as 30. The same realigned BAM files were used in SNP calling through the SAMtools mpileup package.
    (3) In the filter step, we chose the common sites identified by GATK and SAMtools with the SelectVariants package; SNPs with allele frequencies lower than 1% in the population were discarded.

    Indel calling.

    Indel calling was similar to SNP calling but with the UnifiedGenotyper parameter -glm INDEL for the indel report only. Only insertions and deletions shorter than or equal to 6 bp were taken into account.

    Annotation.

    SNP annotation was performed according to the genome using the package ANNOVAR (Version: 2013-08-23).
    Based on the genome annotation, SNPs were categorized in exonic regions (overlapping with a coding exon), splicing sites (within 2 bp of a splicing junction), 5′UTRs and 3′UTRs, intronic regions (overlapping with an intron), upstream and downstream regions (within a 1 kb region upstream or downstream from the transcription start site), and intergenic regions.

    SNPs in coding exons were further grouped into synonymous SNPs (did not cause amino acid changes) or nonsynonymous SNPs (caused amino acid changes; mutations causing stop gain and stop loss were also classified into this group).

    Indels in the exonic regions were classified by whether they had frame-shift (3 bp insertion or deletion) mutations.

  • 相关阅读:
    Linux学习33 crontab定时任务语法在线校验 上海
    python测试开发django175.bootstrap导航带下拉菜单的标签页标签页(navtabs) 上海
    python测试开发django172.jQuery 发送请求获取的数据设置为全局变量 上海
    team讨论有感
    蜕变(3)---模式
    uml建模的随想
    Bridge Strategy 和State的区别
    友元在模式中的运用
    Design&Pattern团队《设计模式在软件开发的应用》精华版
    面向对象乱弹(一)
  • 原文地址:https://www.cnblogs.com/adawong/p/7429871.html
Copyright © 2011-2022 走看看