zoukankan      html  css  js  c++  java
  • Chromosome coordinate systems: 0-based, 1-based

    From:

    https://arnaudceol.wordpress.com/2014/09/18/chromosome-coordinate-systems-0-based-1-based/

    I’ve had hard time figuring out that different website and file formats are using different systems to represent genome coordinate.

    Basically, the bases can be numerated in two way: starting at 0 or starting at 1. Those are the 0-based and 1-based coordinate system.

    0-based:

    ACTGACTG
    012345678

    1-based:

    ACTGACTG
    123456789

    Then you say that the system is inclusive if the last index is part of the sequence or exclusive if it is not.

    For instance to represent the sequence TGAC of ACTGACTG:

    0-based inclusive: 2-5
    1-based inclusive: 3-6
    1-based exclusive: 3-7

    I’ve tried to figure out which website-application are using each coordinate system. The results can be found bellow. For each source, I provide the URL of the reference website where I found the information, and a caption where the system is described.

    I found most of those links in Biostar (https://www.biostars.org/p/6373/) and on the blog of Casey M. Bergman (http://bergmanlab.smith.man.ac.uk/?p=36), who also wrote an article about this argument: https://www.landesbioscience.com/journals/mge/article/19479/.

    Question:
    “I am confused about the start coordinates for items in the refGene table. It looks like you need to add “1” to the starting point in order to get the same start coordinate as is shown by the Genome Browser. Why is this the case?”
    Response:
    Our internal database representations of coordinates always have a zero-based start and a one-based end. We add 1 to the start before displaying coordinates in the Genome Browser. Therefore, they appear as one-based start, one-based end in the graphical display. The refGene.txt file is a database file, and consequently is based on the internal representation.

    We use this particular internal representation because it simplifies coordinate arithmetic, i.e. it eliminates the need to add or subtract 1 at every step. Unfortunately, it does create some confusion when the internal representation is exposed or when we forget to add 1 before displaying a start coordinate. However, it saves us from much trickier bugs. If you use a database dump file but would prefer to see the one-based start coordinates, you will always need to add 1 to each start coordinate.

    If you submit data to the browser in position format (chr#:##-##), the browser assumes this information is 1-based. If you submit data in any other format (BED (chr# ## ##) or otherwise), the browser will assume it is 0-based. You can see this both in our liftOver utility and in our search bar, by entering the same numbers in position or BED format and observing the results. Similarly, any data returned by the browser in position format is 1-based, while data returned in BED format is 0-based.

     

    BED format uses zero-based, half-open coordinates, so the first 25 bases of a sequence are in the range 0-25 (those bases being numbered 0 to 24)

    The first three required BED fields are:

    chrom – The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
    chromStart – The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
    chromEnd – The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.
     
    Lowest numeric position of the reported variant on the genomic reference sequence. start:  Mutation start coordinate (1-based coordinate system), end: Highest numeric genomic position of the reported variant on the genomic reference sequence. Mutation end coordinate (inclusive, 1-based coordinate system).
  • 相关阅读:
    vlc 学习网
    delphi vlc 安装bug 处理编译错误"0" is an invalid value for the "DebugInformation" parameter of the "DCC"
    检测一组电动车电瓶好坏要多久?
    通过VLC的ActiveX进行二次开发,实现一个多媒体播放器 2011-04-10 00:57:23
    最简单的基于libVLC的例子:最简单的基于libVLC的视频播放器
    把任意的EXE嵌入到自己程序中
    http代理工具delphi源码
    2.1.2 列表常用方法
    2.1.1 列表创建与删除
    第2章 Python序列
  • 原文地址:https://www.cnblogs.com/emanlee/p/6848523.html
Copyright © 2011-2022 走看看