zoukankan      html  css  js  c++  java
  • fastx tookit 操作fasta/fastq 文件 (1)

    准备测试文件 test.fq, 包含4条fastq 文件,碱基编码格式为phred64;

    @FC12044_91407_8_200_406_24
    NTTAGCTCCCACCTTAAGATGTTTA
    +FC12044_91407_8_200_406_24
    SXXTXXXXXXXXXTTSUXSSXKTMQ
    @FC12044_91407_8_200_720_610
    CTCTGTGGCACCCCATCCCTCACTT
    +FC12044_91407_8_200_720_610
    OXXXXXXXXXXXXXXXXXTSXQTXU
    @FC12044_91407_8_200_345_133
    GATTTTTTAACAATAAACGTACATA
    +FC12044_91407_8_200_345_133
    OQTOOSFORTFFFIIOFFFFFFFFF
    @FC12044_91407_8_200_106_131
    GTTGCCCAGGCTCGTCTTGAACTCC
    +FC12044_91407_8_200_106_131
    XXXXXXXXXXXXXXSXXXXISTXQS 

    1) fastq_to_fasta , 将fastq 文件转换为fasta文件

    命令:

    fastq_to_fasta -i test.fq -o test.fa

    输出内容为:

    cat test.fa
    >FC12044_91407_8_200_720_610
    CTCTGTGGCACCCCATCCCTCACTT
    >FC12044_91407_8_200_345_133
    GATTTTTTAACAATAAACGTACATA
    >FC12044_91407_8_200_106_131
    GTTGCCCAGGCTCGTCTTGAACTCC
    

    2) fastx_trimmer, 截取fastq 序列, 指定保留序列的起始位置和终止位置, 

    命令:将序列截成10bp长

    fastx_trimmer -f 1 -l 10 -i test.fq -o test.trim.fq

    输出内容为:

    cat test.trim.fq 
    @FC12044_91407_8_200_406_24
    NTTAGCTCCC
    +FC12044_91407_8_200_406_24
    SXXTXXXXXX
    @FC12044_91407_8_200_720_610
    CTCTGTGGCA
    +FC12044_91407_8_200_720_610
    OXXXXXXXXX
    @FC12044_91407_8_200_345_133
    GATTTTTTAA
    +FC12044_91407_8_200_345_133
    OQTOOSFORT
    @FC12044_91407_8_200_106_131
    GTTGCCCAGG
    +FC12044_91407_8_200_106_131
    XXXXXXXXXX

    3) fastq_renamer

    命令:重命名序列标识符, 可以将其用编号代替

    fastx_renamer -n COUNT -i test.fq -o test.renamer.fq

    输出内容为:

    cat test.renamer.fq 
    @1
    NTTAGCTCCCACCTTAAGATGTTTA
    +1
    SXXTXXXXXXXXXTTSUXSSXKTMQ
    @2
    CTCTGTGGCACCCCATCCCTCACTT
    +2
    OXXXXXXXXXXXXXXXXXTSXQTXU
    @3
    GATTTTTTAACAATAAACGTACATA
    +3
    OQTOOSFORTFFFIIOFFFFFFFFF
    @4
    GTTGCCCAGGCTCGTCTTGAACTCC
    +4
    XXXXXXXXXXXXXXSXXXXISTXQS

    4) fasta_formatter, 设置每行最大字符数, 将fasta 文件格式化 

    命令:将每行允许的字符设置为10

    fasta_formatter  -w 10 -i test.fa -o test.formatter.fa

    输出内容为:

    cat test.formatter.fa 
    >FC12044_91407_8_200_720_610
    CTCTGTGGCA
    CCCCATCCCT
    CACTT
    >FC12044_91407_8_200_345_133
    GATTTTTTAA
    CAATAAACGT
    ACATA
    >FC12044_91407_8_200_106_131
    GTTGCCCAGG
    CTCGTCTTGA
    ACTCC
    

    5) fastq_masker, 根据碱基质量的阈值标记序列

    命令:

    fastq_masker -q 40 -i test.fq -o test.masker.fq

    输出内容为:

    cat test.masker.fq 
    @FC12044_91407_8_200_406_24
    NNNNNNNNNNNNNNNNNNNNNNNNN
    +FC12044_91407_8_200_406_24
    SXXTXXXXXXXXXTTSUXSSXKTMQ
    @FC12044_91407_8_200_720_610
    NNNNNNNNNNNNNNNNNNNNNNNNN
    +FC12044_91407_8_200_720_610
    OXXXXXXXXXXXXXXXXXTSXQTXU
    @FC12044_91407_8_200_345_133
    NNNNNNNNNNNNNNNNNNNNNNNNN
    +FC12044_91407_8_200_345_133
    OQTOOSFORTFFFIIOFFFFFFFFF
    @FC12044_91407_8_200_106_131
    NNNNNNNNNNNNNNNNNNNNNNNNN
    +FC12044_91407_8_200_106_131
    XXXXXXXXXXXXXXSXXXXISTXQS  

     

     

  • 相关阅读:
    Blob格式数据处理以及DataTable问题处理
    JavaScript 与 jQuery-简记
    JFinal-学习笔记(下)
    JFinal学习笔记
    工作记录
    读书笔记——计算机科学导论
    面试经验大全
    如何在liunx系统发布项目
    面试必备
    最全面的测试用例
  • 原文地址:https://www.cnblogs.com/xudongliang/p/5081518.html
Copyright © 2011-2022 走看看