zoukankan      html  css  js  c++  java
  • Pacbio 纯三代组装复活草基因组

    对于植物等真核生物基因组来说,重复序列, 多倍体,高杂合度等特征在利用二代数据进行组装的时候都会有很大的问题;

    利用二代数据组装出来的基因组,大多达不到完成图的水准,通常只是覆盖到编码蛋白的基因区域,还是会有很多的区域覆盖不到,而这些区域正是发挥调控功能的非编码基因区域,近年来,非编码功能的研究越来越多,如果拼接出来的基因组上缺少这部分序列,无法进行后续的研究;

    而且由于测序读长的限制和拼接算法的原因,对于重复序列,GC异常区域,会存在组装错误,甚至组装不出来;

    三代测序,其长读长和无GC偏好性等特点,降级了基因组组装时的难度,可以组装出在二代数据中很难组装出来的重复序列和GC异常序列,非常适合做基因组的组装;

    研究人员利用PacbBio RSII 测序平台对复活草进行测序,使用了32个SMRT cells, 测序深度72X

    最终组装出来的结果包含650条contigs, 覆盖度为99%(估计的基因组大小为245Mb, contig的总长度为244Mb),conig的N50长度为2.4M,

    同时还组装出来完整的叶绿体基因组,大小为125,324 bp, 其中有大约25kb为重复序列,

    分析使用的是HGAP的组装流程,参数如下:

    The Oropetium genome was assembled using the
    RS_HGAP_Assembly.3 protocol for assembly and Quiver for genome polishing in SMRT Analysis v2.3.012. This consisted of a three-step process involving
    (1) generation of preassembled reads with improved consensus accuracy;
    (2) assembly of the genome through overlap consensus accuracy using Celera; and
    (3) one round of genome polishing with Quiver.

    For HGAP, the following parameters were used:
    PreAssembler Filter v1 (
    minimum sub-read length= 3,000 bp,
    minimum polymerase read quality = 0.80,
    minimum polymerase read length= 3,000bp
    );
    PreAssembler v2 (
    minimum seed length= 16,000 bp,
    numberof seed read chunks= 6,
    alignment candidates per chunk= 10,
    total alignment candidates= 24,
    min coverage for correction= 6
    );

    AssembleUnitig v1 (
    target genome coverage= 30,
    overlap error rate= 0.06,
    minimum overlap= 40 bp,
    overlap k-mer= 14
    );

    BLASR v1 mapping of reads for genome polishing with Quiver (
    max divergence percentage= 30,
    minimum anchor size= 12).

    A second round of genome polishing was performed using Quiver (SMRT Analysis v2.3.0) to
    further improve the site-specific consensus accuracy of the assembly.
    The following Quiver parameters were used for genome polishing:
    filtering (
    minimum sub-read length= 3,000 bp,
    minimum polymerase read quality= 0.80,
    minimum polymerase read length= 3,000 bp);

    mapping (
    maximum divergence percentage= 30,
    minimum anchor size= 12).

    Default parameters were otherwise employed for both HGAP assembly and Quiver protocols

  • 相关阅读:
    Randomization Tests
    关于Spring中的<context:annotationconfig/>配置
    PUT method support
    在对话框picture control中利用opengl进行绘图
    【学习笔记】《卓有成效的管理者》 第三章 我能贡献什么
    程序员的黄金时代
    nginx webdav配置
    iphone4s 如何强制关机
    并查集
    实战虚拟化存储设计之三MultiPathing
  • 原文地址:https://www.cnblogs.com/xudongliang/p/6873249.html
Copyright © 2011-2022 走看看