zoukankan      html  css  js  c++  java
  • FASTQ format

    FASTQ format

    每个FASTQ文件中每个序列通常有四行信息:
    1: 以 '@' 字符开头,后面紧接着的是序列标识符和可选字段的描述(类似FASTA title line).
    2: 序列
    3: 以 '+' 字符开头, 后面紧接着的是可选字段的描述性信息
    4: 第二行序列的质量信息

    Illumina sequence identifiers

    @HWUSI-EAS100R:6:73:941:1973#0/1

    sequence identifiers description
    HWUSI-EAS100R the unique instrument name
    6 flowcell lane
    73 tile number within the flowcell lane
    941 'x'-coordinate of the cluster within the tile
    1973 'y'-coordinate of the cluster within the tile
    #0 index number for a multiplexed sample (0 for no indexing)
    /1 the member of a pair, /1 or /2 (paired-end or mate-pair reads only)

    Versions of the Illumina pipeline since 1.4 appear to use #NNNNNN instead of #0 for the multiplex ID, where NNNNNN is the sequence of the multiplex tag.

    With Casava 1.8 the format of the '@' line has changed:

    @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG

    sequence identifiers description
    EAS139 the unique instrument name
    136 the run id
    FC706VJ the flowcell id
    2 flowcell lane
    2104 tile number within the flowcell lane
    15343 'x'-coordinate of the cluster within the tile
    197393 'y'-coordinate of the cluster within the tile
    1 the member of a pair, 1 or 2 (paired-end or mate-pair reads only)
    Y Y if the read is filtered, N otherwise
    18 0 when none of the control bits are on, otherwise it is an even number(偶数)
    ATCACG index sequence

    将FASTQ 转换为 FASTA 格式:

    zcat input_file.fastq.gz | awk 'NR%4==1{printf ">%s
    ", substr($0,2)}NR%4==2{print}' > output_file.fa
    
    
    #printf 命令的语法:format-string 为格式控制字符串,arguments 为参数列表。
    printf  format-string  [arguments...]
    
    
    #substr(s,p) 返回字符串s中从p开始的后缀部分
    #substr(s,p,n) 返回字符串s中从p开始长度为n的后缀部分。
    
  • 相关阅读:
    C#获取Excel Sheet名称,对特殊字符、重名进行了处理
    10个你必须知道的jQueryMobile代码片段
    HTML 5 学习之应用程序缓存
    JS取地址栏参数的两种方法
    关于AJAX+HTML5+ASHX进行全静态页面的数据交互
    重病后的重生
    非常值得学习的java 绘图板源代码
    C#开发者通用性代码审查清单
    【week3】四人小组项目—东师论坛
    【week2】结对编程-四则运算 及感想
  • 原文地址:https://www.cnblogs.com/adawong/p/8032871.html
Copyright © 2011-2022 走看看