zoukankan      html  css  js  c++  java
  • Filter FASTA files

    Use a regular expression for filtering sequences by id from a FASTA file, e.g. just certain chromosomes from a genome. There are other tools as part of bigger packages to install (and no regex support), mostly awk-based awkward (sorry for the pun) bash solutions, and scripts using packages that one needs to install and with still no support for regular expressions. This however is a simple, straightforward little python script for a simple task. It doesn’t do anything else and doesn’t need anything but a stock python installation. Based on the FASTA reader snippet

    Download here. 

    Usage:

    python FASTAfilter.py [-h] regex infile outfile

    From a FASTA-file with multiple >entries, filter by sequence ids using a
    regex.

    positional arguments:
    regex Regex to filter entry ids, e.g. ‘chr[1-4]’. Note that the id does not contain the initial > character.
    infile A FASTA input file, usually with multiple entries.
    outfile The new file with only the matching entries.

    optional arguments:
    -h, –help show this help message and exit

    INSTALL:

    cd /data/software
    wget http://dm516.user.srcf.net/fastafilter/FASTAfilter.zip
    unzip FASTAfilter.zip
    easy_install argparse

    USAGE:

    python FASTAfilter.py   [1-9,10,11,12,13,14,15,16,17,18,X] 
    /dat2/INPUT.fa
    /dat2/OUTPUT.fa

    Error:

    Traceback (most recent call last):
      File "FASTAfilter.py", line 3, in <module>
        import argparse
    ImportError: No module named argparse


    Solution:

    run "easy_install argparse" as root user.

    http://dm516.user.srcf.net/?p=314

  • 相关阅读:
    Java日期相关操作
    Java中this的功能与作用
    DCL双检查锁机制实现的线程安全的单例模式
    Java 二分查找
    Java冒泡排序
    Java多线程编程(二)
    SSH小结
    Python快速上手JSON指南
    趣谈、浅析CRLF和LF
    linux开发神器--Tmux
  • 原文地址:https://www.cnblogs.com/emanlee/p/4574884.html
Copyright © 2011-2022 走看看