zoukankan      html  css  js  c++  java
  • 从fasta中提取或者过滤掉多个序列

    Google了一下,现成的工具不多。

    自己写代码也可以,就是速度肯定不快,而且每次写也很麻烦。

    偶然看到QIIME的filter_fasta.py有这个功能,从name list中提取多个序列。

    filter_fasta.py -f extract_no_N_200.fasta -o remain.fasta -s out.list
    

      

    [REQUIRED]
    
    -f, --input_fasta_fp
    Path to the input fasta file
    -o, --output_fasta_fp
    The output fasta filepath
    [OPTIONAL]
    
    -m, --otu_map
    An OTU map where sequences ids are those which should be retained.
    -s, --seq_id_fp
    A list of sequence identifiers (or tab-delimited lines with a seq identifier in the first field) which should be retained.
    -b, --biom_fp
    A biom file where otu identifiers should be retained.
    -a, --subject_fasta_fp
    A fasta file where the seq ids should be retained.
    -p, --seq_id_prefix
    Keep seqs where seq_id starts with this prefix.
    --sample_id_fp
    Keep seqs where seq_id starts with a sample id listed in this file. Must be newline delimited and may not contain a header.
    -n, --negate
    Discard passed seq ids rather than keep passed seq ids. [default: False]
    --mapping_fp
    Mapping file path (for use with –valid_states). [default: None]
    --valid_states
    Description of sample ids to retain (for use with –mapping_fp). [default: None]
    

    60w条序列瞬间就处理完了。  

  • 相关阅读:
    2020年“安洵杯”四川省大学生信息安全技术大赛 Misc WP
    整数划分问题
    二叉树根节点到叶子节点的所有路径和
    java正则表达式
    搜狗笔试
    跟谁学0923笔试
    360 笔试0926
    度小满0920
    TreeMap 常用函数
    达达0920
  • 原文地址:https://www.cnblogs.com/leezx/p/8619051.html
Copyright © 2011-2022 走看看