zoukankan      html  css  js  c++  java
  • PSL format

    PSL lines represent alignments, and are typically taken from files generated by BLAT or psLayout. See the BLAT documentation for more details. All of the following fields are required on each data line within a PSL file:

    1. matches - Number of bases that match that aren't repeats
    2. misMatches - Number of bases that don't match
    3. repMatches - Number of bases that match but are part of repeats
    4. nCount - Number of 'N' bases
    5. qNumInsert - Number of inserts in query
    6. qBaseInsert - Number of bases inserted in query
    7. tNumInsert - Number of inserts in target
    8. tBaseInsert - Number of bases inserted in target
    9. strand - '+' or '-' for query strand. For translated alignments, second '+'or '-' is for genomic strand
    10. qName - Query sequence name
    11. qSize - Query sequence size
    12. qStart - Alignment start position in query
    13. qEnd - Alignment end position in query
    14. tName - Target sequence name
    15. tSize - Target sequence size
    16. tStart - Alignment start position in target
    17. tEnd - Alignment end position in target
    18. blockCount - Number of blocks in the alignment (a block contains no gaps)
    19. blockSizes - Comma-separated list of sizes of each block
    20. qStarts - Comma-separated list of starting positions of each block in query
    21. tStarts - Comma-separated list of starting positions of each block in target

    Example:
    Here is an example of an annotation track in PSL format. Note that line breaks have been inserted into the PSL lines in this example for documentation display purposes. This example can be pasted into the browser without editing.

    browser position chr22:13073000-13074000
    browser hide all
    track name=fishBlats description="Fish BLAT" visibility=2
    useScore=1
    59 9 0 0 1 823 1 96 +- FS_CONTIG_48080_1 1955 171 1062 chr22
        47748585 13073589 13073753 2 48,20,  171,1042,  34674832,34674976,
    59 7 0 0 1 55 1 55 +- FS_CONTIG_26780_1 2825 2456 2577 chr22
        47748585 13073626 13073747 2 21,45,  2456,2532,  34674838,34674914,
    59 7 0 0 1 55 1 55 -+ FS_CONTIG_26780_1 2825 2455 2676 chr22
        47748585 13073727 13073848 2 45,21,  249,349,  13073727,13073827,
    

    Click here to display this track in the Genome Browser.

    Be aware that the coordinates for a negative strand in a PSL line are handled in a special way. In the qStart and qEnd fields, the coordinates indicate the position where the query matches from the point of view of the forward strand, even when the match is on the reverse strand. However, in the qStarts list, the coordinates are reversed.

    Example:
    Here is a 61-mer containing 2 blocks that align on the minus strand and 2 blocks that align on the plus strand (this sometimes happens due to assembly errors):

    0         1         2         3         4         5         6 tens position in query  
    0123456789012345678901234567890123456789012345678901234567890 ones position in query   
                          ++++++++++++++                    +++++ plus strand alignment on query   
        ------------------              --------------------      minus strand alignment on query   
    0987654321098765432109876543210987654321098765432109876543210 ones position in query negative strand coordinates
    6         5         4         3         2         1         0 tens position in query negative strand coordinates
    
    Plus strand:   
         qStart=22
         qEnd=61 
         blockSizes=14,5 
         qStarts=22,56 
                      
    Minus strand:   
         qStart=4 
         qEnd=56 
         blockSizes=20,18 
         qStarts=5,39   
    

    Essentially, the minus strand blockSizes and qStarts are what you would get if you reverse-complemented the query. However, the qStart and qEnd are not reversed. Use the following formulas to convert one to the other:

         Negative-strand-coordinate-qStart = qSize - qEnd   = 61 - 56 =  5
         Negative-strand-coordinate-qEnd   = qSize - qStart = 61 -  4 = 57
    

    BLAT this actual sequence against hg19 for a real-world example:


    CCCC
    GGGTAAAATGAGTTTTTT
    GGTCCAATCTTTTA
    ATCCACTCCCTACCCTCCTA
    GCAAG


    Look for the alignment on the negative strand (-) of chr21, which conveniently aligns to the window chr21:10,000,001-10,000,061.

    Browser window coordinates are 1-based [start,end] while psl coordinates are 0-based [start,end), so a start of 10,000,001 in the browser corresponds to a start of 10,000,000 in the psl. Subtracting 10,000,000 from the target (chromosome) position in psl gives the query negative strand coordinate above.

    The 4, 14, and 5 bases at beginning, middle, and end were chosen to not match with the genome at the corresponding position.

  • 相关阅读:
    vue 子组件像父组件传递数据
    SQL Query XML column.   SQL 查询 xml 字段
    最方便的批处理延时方法
    Automation testing framework for RFT execution with STAF+STAX . [Session1]
    Disable Windows server 2003 Security Warning.
    Perl初级教程 (5) 遍历文件夹内指定扩展名文件,查找匹配关键字的输出。
    Perl 基于 Windows 环境 搭建
    Perl Scalar
    Package you execution files with Iexpress.exe
    SQLServer2005 remove log file.
  • 原文地址:https://www.cnblogs.com/pennyy/p/4260934.html
Copyright © 2011-2022 走看看