zoukankan      html  css  js  c++  java
  • 从fasta中提取或者过滤掉多个序列

    Google了一下,现成的工具不多。

    自己写代码也可以,就是速度肯定不快,而且每次写也很麻烦。

    偶然看到QIIME的filter_fasta.py有这个功能,从name list中提取多个序列。

    filter_fasta.py -f extract_no_N_200.fasta -o remain.fasta -s out.list
    

      

    [REQUIRED]
    
    -f, --input_fasta_fp
    Path to the input fasta file
    -o, --output_fasta_fp
    The output fasta filepath
    [OPTIONAL]
    
    -m, --otu_map
    An OTU map where sequences ids are those which should be retained.
    -s, --seq_id_fp
    A list of sequence identifiers (or tab-delimited lines with a seq identifier in the first field) which should be retained.
    -b, --biom_fp
    A biom file where otu identifiers should be retained.
    -a, --subject_fasta_fp
    A fasta file where the seq ids should be retained.
    -p, --seq_id_prefix
    Keep seqs where seq_id starts with this prefix.
    --sample_id_fp
    Keep seqs where seq_id starts with a sample id listed in this file. Must be newline delimited and may not contain a header.
    -n, --negate
    Discard passed seq ids rather than keep passed seq ids. [default: False]
    --mapping_fp
    Mapping file path (for use with –valid_states). [default: None]
    --valid_states
    Description of sample ids to retain (for use with –mapping_fp). [default: None]
    

    60w条序列瞬间就处理完了。  

  • 相关阅读:
    Linux 显示当前时间
    Jenkins 更改工作目录;
    Jenkins 编译 .net 项目
    Jenkins 通过 maven 构建编译 JAVA 项目环境
    Jenkins
    Zatree
    Zabbix 邮件报警示例
    Zabbix 短信报警示例
    Linux 检测出口IP地址
    数据结构与算法面试题80道(4)
  • 原文地址:https://www.cnblogs.com/leezx/p/8619051.html
Copyright © 2011-2022 走看看