zoukankan      html  css  js  c++  java
  • 从fasta中提取或者过滤掉多个序列

    Google了一下,现成的工具不多。

    自己写代码也可以,就是速度肯定不快,而且每次写也很麻烦。

    偶然看到QIIME的filter_fasta.py有这个功能,从name list中提取多个序列。

    filter_fasta.py -f extract_no_N_200.fasta -o remain.fasta -s out.list
    

      

    [REQUIRED]
    
    -f, --input_fasta_fp
    Path to the input fasta file
    -o, --output_fasta_fp
    The output fasta filepath
    [OPTIONAL]
    
    -m, --otu_map
    An OTU map where sequences ids are those which should be retained.
    -s, --seq_id_fp
    A list of sequence identifiers (or tab-delimited lines with a seq identifier in the first field) which should be retained.
    -b, --biom_fp
    A biom file where otu identifiers should be retained.
    -a, --subject_fasta_fp
    A fasta file where the seq ids should be retained.
    -p, --seq_id_prefix
    Keep seqs where seq_id starts with this prefix.
    --sample_id_fp
    Keep seqs where seq_id starts with a sample id listed in this file. Must be newline delimited and may not contain a header.
    -n, --negate
    Discard passed seq ids rather than keep passed seq ids. [default: False]
    --mapping_fp
    Mapping file path (for use with –valid_states). [default: None]
    --valid_states
    Description of sample ids to retain (for use with –mapping_fp). [default: None]
    

    60w条序列瞬间就处理完了。  

  • 相关阅读:
    Lucene.net 搜索引擎的中文资料
    构建ambari
    mkisofs 制作iso镜像文件
    sed命令将换行转换为逗号
    centos6.5修改系统时间的时区
    Ambari集群安装部署问题
    rpm下载离线安装包并且安装
    虚拟机克隆CentOS后的网卡配置-——解决网络不通问题
    mysql数据库,中文显示问号
    linux文件权限
  • 原文地址:https://www.cnblogs.com/leezx/p/8619051.html
Copyright © 2011-2022 走看看