zoukankan      html  css  js  c++  java
  • fasta多行文件处理

    1.创建fa文件,如下,命令为1.fa
    >SOX2    
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1    
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG    
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGG
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC
    ACGAGGGACGCATCGGACGACTGCAGGACTGT

    2.给>开头的行的行尾加上TAB键,以便隔开名字和序列,

    sed 's/^(>.*)/1 /' test.fasta | cat -A   > 2.fa(cat -A可以显示所有的符号)  ###  ()表示记录匹配的内容,1则表示()中记录的匹配的内容(没怎么看懂这个命令)

    结果如下:

    >SOX2^I$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    ACGAGGGACGCATCGGACGACTGCAGGAC$
    >POU5F1^I$
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT$
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCC$
    >NANOG^I$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    ACGAGGGACGCATCGGACGACTGCAGG$
    ACGAGGGACGCATCGGACGACTGCAGGACTGTC$
    

    3.把所有的换行符替换为空格,tr

    cat 2.fa  | tr ' '   ' ' > 3.fa

    >SOX2     ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGAC >POU5F1     CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT CGGAAGGTAGTCGTCAGTGCAGCGAGTCC >NANOG     ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT

    4.把最后一个空格替换成换行符

    sed -e 's/ $/ /' 3.fa > 4.fa

    5.把‘ >’替换成换行符(空格+>)

    sed -e 's/ >/ >/g' 4.fa  > 5.fa

    >SOX2     ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1     CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGT  CGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG     ACGAGGGACGCATCGGACGACTGCAGGACTGTC   ACGAGGGACGCATCGGACGACTGCAGG ACGAGGGACGCATCGGACGACTGCAGGACTGTC ACGAGGGACGCATCGGACGACTGCAGGACTGT

    6.把所有的空格替换掉

    sed 's/  //g' 5.fa > 6.fa

    >SOX2    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT

    7.把TAB键转换为换行符

    sed 's/ / /g'  6.fa > 7.fa

    >SOX2
    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGAC
    >POU5F1
    CGGAAGGTAGTCGTCAGTGCAGCGAGTCCGTCGGAAGGTAGTCGTCAGTGCAGCGAGTCC
    >NANOG
    ACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACGAGGGACGCATCGGACGACTGCAGGACTGTCACGAGGGACGCATCGGACGACTGCAGGACTGT

  • 相关阅读:
    【LeetCode】13. Roman to Integer (2 solutions)
    【LeetCode】16. 3Sum Closest
    【LeetCode】18. 4Sum (2 solutions)
    【LeetCode】168. Excel Sheet Column Title
    如何应用性能测试常用计算公式
    系统吞吐量(TPS)、用户并发量、性能测试概念和公式
    Monkey测试3——Monkey测试结果分析
    Monkey测试2——Monkey测试策略
    Monkey测试1——Monkey的使用
    TestNG 三 测试方法
  • 原文地址:https://www.cnblogs.com/lmt921108/p/7714906.html
Copyright © 2011-2022 走看看