准备测试文件 test.fq, 包含4条fastq 文件,碱基编码格式为phred64;
@FC12044_91407_8_200_406_24 NTTAGCTCCCACCTTAAGATGTTTA +FC12044_91407_8_200_406_24 SXXTXXXXXXXXXTTSUXSSXKTMQ @FC12044_91407_8_200_720_610 CTCTGTGGCACCCCATCCCTCACTT +FC12044_91407_8_200_720_610 OXXXXXXXXXXXXXXXXXTSXQTXU @FC12044_91407_8_200_345_133 GATTTTTTAACAATAAACGTACATA +FC12044_91407_8_200_345_133 OQTOOSFORTFFFIIOFFFFFFFFF @FC12044_91407_8_200_106_131 GTTGCCCAGGCTCGTCTTGAACTCC +FC12044_91407_8_200_106_131 XXXXXXXXXXXXXXSXXXXISTXQS
1) fastq_to_fasta , 将fastq 文件转换为fasta文件
命令:
fastq_to_fasta -i test.fq -o test.fa
输出内容为:
cat test.fa >FC12044_91407_8_200_720_610 CTCTGTGGCACCCCATCCCTCACTT >FC12044_91407_8_200_345_133 GATTTTTTAACAATAAACGTACATA >FC12044_91407_8_200_106_131 GTTGCCCAGGCTCGTCTTGAACTCC
2) fastx_trimmer, 截取fastq 序列, 指定保留序列的起始位置和终止位置,
命令:将序列截成10bp长
fastx_trimmer -f 1 -l 10 -i test.fq -o test.trim.fq
输出内容为:
cat test.trim.fq @FC12044_91407_8_200_406_24 NTTAGCTCCC +FC12044_91407_8_200_406_24 SXXTXXXXXX @FC12044_91407_8_200_720_610 CTCTGTGGCA +FC12044_91407_8_200_720_610 OXXXXXXXXX @FC12044_91407_8_200_345_133 GATTTTTTAA +FC12044_91407_8_200_345_133 OQTOOSFORT @FC12044_91407_8_200_106_131 GTTGCCCAGG +FC12044_91407_8_200_106_131 XXXXXXXXXX
3) fastq_renamer
命令:重命名序列标识符, 可以将其用编号代替
fastx_renamer -n COUNT -i test.fq -o test.renamer.fq
输出内容为:
cat test.renamer.fq @1 NTTAGCTCCCACCTTAAGATGTTTA +1 SXXTXXXXXXXXXTTSUXSSXKTMQ @2 CTCTGTGGCACCCCATCCCTCACTT +2 OXXXXXXXXXXXXXXXXXTSXQTXU @3 GATTTTTTAACAATAAACGTACATA +3 OQTOOSFORTFFFIIOFFFFFFFFF @4 GTTGCCCAGGCTCGTCTTGAACTCC +4 XXXXXXXXXXXXXXSXXXXISTXQS
4) fasta_formatter, 设置每行最大字符数, 将fasta 文件格式化
命令:将每行允许的字符设置为10
fasta_formatter -w 10 -i test.fa -o test.formatter.fa
输出内容为:
cat test.formatter.fa >FC12044_91407_8_200_720_610 CTCTGTGGCA CCCCATCCCT CACTT >FC12044_91407_8_200_345_133 GATTTTTTAA CAATAAACGT ACATA >FC12044_91407_8_200_106_131 GTTGCCCAGG CTCGTCTTGA ACTCC
5) fastq_masker, 根据碱基质量的阈值标记序列
命令:
fastq_masker -q 40 -i test.fq -o test.masker.fq
输出内容为:
cat test.masker.fq @FC12044_91407_8_200_406_24 NNNNNNNNNNNNNNNNNNNNNNNNN +FC12044_91407_8_200_406_24 SXXTXXXXXXXXXTTSUXSSXKTMQ @FC12044_91407_8_200_720_610 NNNNNNNNNNNNNNNNNNNNNNNNN +FC12044_91407_8_200_720_610 OXXXXXXXXXXXXXXXXXTSXQTXU @FC12044_91407_8_200_345_133 NNNNNNNNNNNNNNNNNNNNNNNNN +FC12044_91407_8_200_345_133 OQTOOSFORTFFFIIOFFFFFFFFF @FC12044_91407_8_200_106_131 NNNNNNNNNNNNNNNNNNNNNNNNN +FC12044_91407_8_200_106_131 XXXXXXXXXXXXXXSXXXXISTXQS