zoukankan      html  css  js  c++  java
  • 4、Brief primer and lexicon for PacBio SMRT sequencing

    转载:http://pacbiofileformats.readthedocs.io/en/5.1/Primer.html

    转载:http://pacbiofileformats.readthedocs.io/en/5.1/#legacy-formats

    PacBio SMRT sequencing operates within a silicon chip (a SMRTcell) fabricated to contain a large number of microscopic holes (ZMWs, or zero-mode waveguides), each assigned a hole number.

    Within a ZMW, PacBio SMRT sequencing is performed on a circularized molecule called a SMRTbell. The SMRTbell, depicted below, consists of:

    • the customer’s double-stranded DNA insert (with sequence II, read following the arrow)
    • (optional) double-stranded DNA barcodes (sequences BL,BRBL,BR) used for multiplexing DNA samples. While the barcodes are optional, they must be present at both ends if present at all. Barcodes may or may not besymmetric, where symmetric means BL=BRCRBL=BRRC.
    • SMRTbell adapters (sequences AL,ARAL,AR), each consisting of a double stranded stem and a single-stranded hairpin loop. Adapters may or may not be symmetric, where symmetric means AL=ARAL=AR.
    _images/smrtbell.png

    A schematic drawing of a SMRTbell

    SMRT sequencing interrogates the incorporated bases in the product strand of a replication reaction. Assuming the sequencing of the template above began at START, the following sequence of bases would be incorporated (where we are using the superscripts C, R, and RC to denote sequence complementation, reversal, and reverse-complementation):

    ACLBCLICBCRACRBRRIRBRLACLALCBLCICBRCARCBRRIRBLRALC…

    (note the identity (xRC)C=xR(xRC)C=xR).

    The ZMW read is the full output of the instrument/basecaller upon observing this series of incorporations, subject to errors due to optical and other limitations. Adapter regions and barcode regions are the spans of the ZMW read corresponding to the adapter and barcode DNA. The subreads are the spans of the ZMW read corresponding to the DNA insert.

    One complication arises when one considers the possibility that a ZMW might not contain a single sequencing reaction. Indeed it could could contain zero—in which case the ensuing basecalls are a product of background noise—or it could contain more than one, in which case the basecall sequence represents two intercalated reads, effectively appearing as noise. To remove such noisy sequence, the high quality (HQ) region finder in PostPrimary algorithmically detects a maximal interval of the ZMW read where it appears that a single sequencing reaction is taking place. This region is designated the HQ region, and in the standard mode of operation, PostPrimary will only output the subreads detected within the HQ region.

    _images/zmwread.png

    A schematic of the regions designated within a ZMW read

    Note

     

    Our coordinate system begins at the first basecall in the ZMW read (deemed base 0)—i.e., it is notrelative to the HQ region. Intervals in PacBio reads are given in end-exclusive (“half-open”) coordinates. This style of coordinate system should be familiar to Python or C++ STL programmers.

    BAM everywhere

    Unaligned BAM files representing the subreads will be produced natively by the PacBio instrument. The subreads BAM will be the starting point for secondary analysis. In addition, the scraps arising from cutting out adapter and barcode sequences will be retained in a scraps.bam file, to enable reconstruction of HQ regions of the ZMW reads, in case the customer needs to rerun barcode finding with a different option.

    The circular consensus tool/workflow (CCS) will take as input an unaligned subreads BAM file and produce an output BAM file containing unaligned consensus reads.

    Alignment (mapping) programs take these unaligned BAM files as input and will produce aligned BAM files, faithfully retaining all tags and headers.

  • 相关阅读:
    【mybatis】mybatis查询 结果 用map接收,无实体接收 + 关联子表 一并返回主子表的结果
    【mysql】 mybatis实现 主从表 left join 1:n 一对多 分页查询 主表从表都有查询条件 【mybatis】count 统计+JSON查询
    【mysql】获取某个表所有列名【mybatis】
    【mybatis】mybatis中insert操作,返回自增id
    【mybatis】从一个错误,看mybatis中的#和$的区别
    【java】单实例下的 流水号【21位】
    【mysql】新增列 时间戳
    【vue】搭建vue环境以及要安装的所有东西
    【小程序】小程序开发自定义组件的步骤>>>>>>>>>小程序开发过程中报错:jsEnginScriptError
    jmap错误:unknown CollectedHeap type : class sun.jvm.hotspot.gc_interface.CollectedHeap
  • 原文地址:https://www.cnblogs.com/renping/p/9061999.html
Copyright © 2011-2022 走看看