zoukankan      html  css  js  c++  java
  • fastx tookit 操作fasta/fastq 文件 (1)

    准备测试文件 test.fq, 包含4条fastq 文件,碱基编码格式为phred64;

    @FC12044_91407_8_200_406_24
    NTTAGCTCCCACCTTAAGATGTTTA
    +FC12044_91407_8_200_406_24
    SXXTXXXXXXXXXTTSUXSSXKTMQ
    @FC12044_91407_8_200_720_610
    CTCTGTGGCACCCCATCCCTCACTT
    +FC12044_91407_8_200_720_610
    OXXXXXXXXXXXXXXXXXTSXQTXU
    @FC12044_91407_8_200_345_133
    GATTTTTTAACAATAAACGTACATA
    +FC12044_91407_8_200_345_133
    OQTOOSFORTFFFIIOFFFFFFFFF
    @FC12044_91407_8_200_106_131
    GTTGCCCAGGCTCGTCTTGAACTCC
    +FC12044_91407_8_200_106_131
    XXXXXXXXXXXXXXSXXXXISTXQS 

    1) fastq_to_fasta , 将fastq 文件转换为fasta文件

    命令:

    fastq_to_fasta -i test.fq -o test.fa

    输出内容为:

    cat test.fa
    >FC12044_91407_8_200_720_610
    CTCTGTGGCACCCCATCCCTCACTT
    >FC12044_91407_8_200_345_133
    GATTTTTTAACAATAAACGTACATA
    >FC12044_91407_8_200_106_131
    GTTGCCCAGGCTCGTCTTGAACTCC
    

    2) fastx_trimmer, 截取fastq 序列, 指定保留序列的起始位置和终止位置, 

    命令:将序列截成10bp长

    fastx_trimmer -f 1 -l 10 -i test.fq -o test.trim.fq

    输出内容为:

    cat test.trim.fq 
    @FC12044_91407_8_200_406_24
    NTTAGCTCCC
    +FC12044_91407_8_200_406_24
    SXXTXXXXXX
    @FC12044_91407_8_200_720_610
    CTCTGTGGCA
    +FC12044_91407_8_200_720_610
    OXXXXXXXXX
    @FC12044_91407_8_200_345_133
    GATTTTTTAA
    +FC12044_91407_8_200_345_133
    OQTOOSFORT
    @FC12044_91407_8_200_106_131
    GTTGCCCAGG
    +FC12044_91407_8_200_106_131
    XXXXXXXXXX

    3) fastq_renamer

    命令:重命名序列标识符, 可以将其用编号代替

    fastx_renamer -n COUNT -i test.fq -o test.renamer.fq

    输出内容为:

    cat test.renamer.fq 
    @1
    NTTAGCTCCCACCTTAAGATGTTTA
    +1
    SXXTXXXXXXXXXTTSUXSSXKTMQ
    @2
    CTCTGTGGCACCCCATCCCTCACTT
    +2
    OXXXXXXXXXXXXXXXXXTSXQTXU
    @3
    GATTTTTTAACAATAAACGTACATA
    +3
    OQTOOSFORTFFFIIOFFFFFFFFF
    @4
    GTTGCCCAGGCTCGTCTTGAACTCC
    +4
    XXXXXXXXXXXXXXSXXXXISTXQS

    4) fasta_formatter, 设置每行最大字符数, 将fasta 文件格式化 

    命令:将每行允许的字符设置为10

    fasta_formatter  -w 10 -i test.fa -o test.formatter.fa

    输出内容为:

    cat test.formatter.fa 
    >FC12044_91407_8_200_720_610
    CTCTGTGGCA
    CCCCATCCCT
    CACTT
    >FC12044_91407_8_200_345_133
    GATTTTTTAA
    CAATAAACGT
    ACATA
    >FC12044_91407_8_200_106_131
    GTTGCCCAGG
    CTCGTCTTGA
    ACTCC
    

    5) fastq_masker, 根据碱基质量的阈值标记序列

    命令:

    fastq_masker -q 40 -i test.fq -o test.masker.fq

    输出内容为:

    cat test.masker.fq 
    @FC12044_91407_8_200_406_24
    NNNNNNNNNNNNNNNNNNNNNNNNN
    +FC12044_91407_8_200_406_24
    SXXTXXXXXXXXXTTSUXSSXKTMQ
    @FC12044_91407_8_200_720_610
    NNNNNNNNNNNNNNNNNNNNNNNNN
    +FC12044_91407_8_200_720_610
    OXXXXXXXXXXXXXXXXXTSXQTXU
    @FC12044_91407_8_200_345_133
    NNNNNNNNNNNNNNNNNNNNNNNNN
    +FC12044_91407_8_200_345_133
    OQTOOSFORTFFFIIOFFFFFFFFF
    @FC12044_91407_8_200_106_131
    NNNNNNNNNNNNNNNNNNNNNNNNN
    +FC12044_91407_8_200_106_131
    XXXXXXXXXXXXXXSXXXXISTXQS  

     

     

  • 相关阅读:
    小艾电台-小众音乐科普讲座
    永乐计分器
    顺金斗花牌-比大小
    Bigger_0305
    iTunes Connect后台无法创建App的解决方案
    iOS navigationBar导航栏底部与self.view的分界线的隐藏
    iOS 十六进制的相加取反
    UITabBar-UITabBarItem图片的背景颜色属性和文字的颜色大小设置
    iOS GCD多线程介绍
    [POJ3461] Oulipo
  • 原文地址:https://www.cnblogs.com/xudongliang/p/5081518.html
Copyright © 2011-2022 走看看