zoukankan      html  css  js  c++  java
  • Variation calling and annotation

    Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean

    本文摘自《Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean》

    Variation calling and annotation.

    Mapping.

    SAMtools (Version: 0.1.18) software was used to convert mapping results into the BAM format and to filter the unmapped and non-unique reads.

    Duplicated reads were filtered with the Picard package (picard.sourceforge.net, Version:1.87).

    The BEDtools (Version: 2.17.0) coverageBed program was used to compute the coverage of sequence alignments. (A sequence was defined as absent if coverage was lower than 90% and present if coverage was greater than 90%.)

    SNP calling.

    SNP detection was performed using the Genome Analysis Toolkit (GATK, version 2.4-7-g5e89f01) and SAMtools. Only the SNPs detected by both methods were analyzed further.
    The detailed processes were as follows:
    (1) After BWA alignment, the reads around indels were realigned.
    Realignment was performed with GATK in two steps.
    The first step used the RealignerTargetCreator package to identify regions where realignment was needed;
    The second step used IndelRealigner to realign the regions found in the first step, which produced a realigned BAM file for each accession.
    (2) SNPs were called at a population level with GATK and SAMtools. For GATK, the SNP confidence score was set as greater than 30, and the parameter -stand_call_conf was set as 30. The same realigned BAM files were used in SNP calling through the SAMtools mpileup package.
    (3) In the filter step, we chose the common sites identified by GATK and SAMtools with the SelectVariants package; SNPs with allele frequencies lower than 1% in the population were discarded.

    Indel calling.

    Indel calling was similar to SNP calling but with the UnifiedGenotyper parameter -glm INDEL for the indel report only. Only insertions and deletions shorter than or equal to 6 bp were taken into account.

    Annotation.

    SNP annotation was performed according to the genome using the package ANNOVAR (Version: 2013-08-23).
    Based on the genome annotation, SNPs were categorized in exonic regions (overlapping with a coding exon), splicing sites (within 2 bp of a splicing junction), 5′UTRs and 3′UTRs, intronic regions (overlapping with an intron), upstream and downstream regions (within a 1 kb region upstream or downstream from the transcription start site), and intergenic regions.

    SNPs in coding exons were further grouped into synonymous SNPs (did not cause amino acid changes) or nonsynonymous SNPs (caused amino acid changes; mutations causing stop gain and stop loss were also classified into this group).

    Indels in the exonic regions were classified by whether they had frame-shift (3 bp insertion or deletion) mutations.

  • 相关阅读:
    IP应用加速技术详解:如何提升动静混合站点的访问速率?
    阿里云PolarDB发布重大更新 支持Oracle等数据库一键迁移上云
    BigData NoSQL —— ApsaraDB HBase数据存储与分析平台概览
    洛谷P1457 城堡 The Castle
    洛谷P1461 海明码 Hamming Codes
    洛谷P1460 健康的荷斯坦奶牛 Healthy Holsteins
    洛谷P1459 三值的排序 Sorting a Three-Valued Sequence
    洛谷P1458 顺序的分数 Ordered Fractions
    洛谷P1218 [USACO1.5]特殊的质数肋骨 Superprime Rib
    洛谷P1215 [USACO1.4]母亲的牛奶 Mother's Milk
  • 原文地址:https://www.cnblogs.com/adawong/p/7429871.html
Copyright © 2011-2022 走看看