zoukankan      html  css  js  c++  java
  • Python处理多行文本问题--一个简单方法读取多行fasta文件

    在处理fasta序列时,常常会遇到一条序列多行排列的现象,如下所示:

    $cat test.fasta
    >test_1
    TGGGGAATCTTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGAGCGATGAAGGCCTTAGGGTTGTAAAGCTCT
    TTCAGCTGGGAAGATAATGACGGTACCAGCAGAAGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGG
    GGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGATTGTTAAGTCGGGGGTGAAATCCCGGGGCTCAA
    CCCCGGAACTGCCTCCGATACTGGCAATCTTGAGATCGAGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTA
    GATATTCGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAA
    CAGG
    >test_3
    TGGGGAATATTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTGCGGGTTGTAAAGCACT
    TTCAGTAGAGAAGAAATGCCCATGGTTAATACCCGTGGGTCTTGACGTAACCTACAGAAGAAGCACCGGCTAACTCCGTG
    CCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTTGGTCAG
    TCGGATGTGAAAGCCCTAGGCTCAACCTGGGAATGGCATTCGATACTGCCTGACTAGAGTATGGTAGAGGGAAGTGGAAT
    TTCCGGTGTAGCGGTGAAATGCGTAGATATCGGAAGGAACACCAGTGGCGAAGGCGACTTCCTGGGCCAATACTGACGCT
    GAGGTGCGAAAGCGTGGGGAGCAAACAGG
    >test_4
    TGGGGAATTTTGGGCAATGGGCGAAAGCCTGACCCAGCAACGCCGCGTGGAGGATGAAGGCCCTCGGGTCGTAAACTCCT
    GTCCTAGGGGAAGAAAAAAATGACGGTACCCTTGGAGGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAAGACGGG
    GGGGGGGGAGCGGTGTTCGGAATTACTGGGCGTAAAGGGCGCGCAGGCGGCCTGGGAAGTCTTGGGTGAAAGCCCCCAGC
    TCAACTGGGGAATGGCCTGAGAAACCACTAGGCTGGAGTGCTGGAGAGGGAAGCGGAATTCCCGGTGGAGCGGTGAAATG
    CGTAGATATCGGGAGGAACACCAGAGGCGAAGGCGGCTTCCTGGACAGACACTGACGCTGAGGCGCGAAAGCTAGGGGAG
    CAAACGGG
    >test_5
    TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGAGTGATGAAGGCCCTAGGGTTGTAAAGCTCT
    TTCACCGGTGAAGATAATGACGGTAACCGGAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGG
    GGCTAGCGTTGTTCGGATTTACTGGGCGTAAAGCGCACGTAGGCGGACTATTAAGTCAGGGGTGAAATCCCGGGGCTCAA
    CCCCGGAACTGCCTTTGATACTGGTAGTCTTGAGTTCGAGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTA
    GATATTCGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAA
    CAGG
    >test_6
    GGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACACCGCGTGCGGGATGAAGGCCCTCGGGTTGTAAACCGCTTTC
    AGGAGGGACGAAAATGACGGTACCTCCAGAAGAAGGCCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGCC
    AAACGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCGGTTCAACAAGTCGATCGTGAAAGCCCGGGGCTCAACCC
    CGGGACGCCGGTCGAAACTGTTGTGACTAGGGTCCGGTAGAGGTGAGTGGAATTCTCGGTGTAGCGGTGGAATGCGCAGA
    TATCGAGAGGAACACCAGTTGCGAAGGCGGCTCACTGGGCCGGTACCGACGCTAAGGAGCGAAAGCGTGGGGAGCAAACA
    GG

    我的一个简单处理方法是,【整体读入-->分隔符分割为列表-->字符串合并列表】,代码如下:

    seq_file=open("test.fasta")  
    seq_list=seq_file.read().split(">")
    for seq in seq_list :
        if seq :
            seq_name=seq.split("
    ")[0]
            seq_fa="".join(seq.split("
    ")[1:])
            print ">" + seq_name + "
    " + seq_fa
    

    打印结果为:

    >test_1
    TGGGGAATCTTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGAGCGATGAAGGCCTTAGGGTTGTAAAGCTCTTTCAGCTGGGAAGATAATGACGGTACCAGCAGAAGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGATTGTTAAGTCGGGGGTGAAATCCCGGGGCTCAACCCCGGAACTGCCTCCGATACTGGCAATCTTGAGATCGAGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGG
    >test_3
    TGGGGAATATTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTGCGGGTTGTAAAGCACTTTCAGTAGAGAAGAAATGCCCATGGTTAATACCCGTGGGTCTTGACGTAACCTACAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTTGGTCAGTCGGATGTGAAAGCCCTAGGCTCAACCTGGGAATGGCATTCGATACTGCCTGACTAGAGTATGGTAGAGGGAAGTGGAATTTCCGGTGTAGCGGTGAAATGCGTAGATATCGGAAGGAACACCAGTGGCGAAGGCGACTTCCTGGGCCAATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGG
    >test_4
    TGGGGAATTTTGGGCAATGGGCGAAAGCCTGACCCAGCAACGCCGCGTGGAGGATGAAGGCCCTCGGGTCGTAAACTCCTGTCCTAGGGGAAGAAAAAAATGACGGTACCCTTGGAGGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAAGACGGGGGGGGGGGAGCGGTGTTCGGAATTACTGGGCGTAAAGGGCGCGCAGGCGGCCTGGGAAGTCTTGGGTGAAAGCCCCCAGCTCAACTGGGGAATGGCCTGAGAAACCACTAGGCTGGAGTGCTGGAGAGGGAAGCGGAATTCCCGGTGGAGCGGTGAAATGCGTAGATATCGGGAGGAACACCAGAGGCGAAGGCGGCTTCCTGGACAGACACTGACGCTGAGGCGCGAAAGCTAGGGGAGCAAACGGG
    >test_5
    TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGAGTGATGAAGGCCCTAGGGTTGTAAAGCTCTTTCACCGGTGAAGATAATGACGGTAACCGGAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGTTCGGATTTACTGGGCGTAAAGCGCACGTAGGCGGACTATTAAGTCAGGGGTGAAATCCCGGGGCTCAACCCCGGAACTGCCTTTGATACTGGTAGTCTTGAGTTCGAGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGG
    >test_6
    GGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACACCGCGTGCGGGATGAAGGCCCTCGGGTTGTAAACCGCTTTCAGGAGGGACGAAAATGACGGTACCTCCAGAAGAAGGCCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGCCAAACGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCGGTTCAACAAGTCGATCGTGAAAGCCCGGGGCTCAACCCCGGGACGCCGGTCGAAACTGTTGTGACTAGGGTCCGGTAGAGGTGAGTGGAATTCTCGGTGTAGCGGTGGAATGCGCAGATATCGAGAGGAACACCAGTTGCGAAGGCGGCTCACTGGGCCGGTACCGACGCTAAGGAGCGAAAGCGTGGGGAGCAAACAGG
  • 相关阅读:
    web开发发送短信实现最简单的接口
    2分钟学会ajax 入门ajax必备
    基于注册登陆简单的使用django认证系统
    Django 发送邮件
    关于python开始写项目创建一个虚拟环境
    pycharm使用bootstrap组件方法
    linux安装配置python环境以及虚拟环境和django下载
    luffy项目搭建流程(Django前后端分离项目范本)
    python微信服务号关注授权、消息推送流程
    Celery—分布式的异步任务处理系统
  • 原文地址:https://www.cnblogs.com/xlij1205/p/10504418.html
Copyright © 2011-2022 走看看