zoukankan      html  css  js  c++  java
  • 【Python小试】计算目录下所有DNA序列的Kmer并过滤

    背景

    Kmer是基因组组装算法中经常接触到的概念,简单来说,Kmer就是长度为k的核苷酸序列。一般长短为m的reads可以分成m-k+1个Kmer。Kmer的长度和阈值直接影响到组装的效果。

    Denovo组装流程:原始数据——数据过滤——纠错——kmer分析——denovo组装

    组装测序策略:根据基因组大小和具体情况选择个大概的k值,构建contig所需的数据量以及所需的构建的文库数量。对于植物基因组一般考虑的是大kmer(>31),动物一般在27左右,具体根据基因组情况调整。需要在短片段数据量达到20X左右的时候进行kmer分析。Kmer分析正常后,继续加测数据以达到最后期望的数据量。

    编码

    import os
    import sys
    
    # convert command line arguments to variables
    kmer_size = int(sys.argv[1])
    count_cutoff = int(sys.argv[2])
    
    # define the function to split dna
    def split_dna(dna, kmer_size):
        kmers = []
        for start in range(0,len(dna)-(kmer_size-1),1):
            kmer = dna[start:start+kmer_size]
            kmers.append(kmer)
        return kmers
    
    # create an empty dictionary to hold the counts
    kmer_counts = {}
    
    # process each file with the right name
    for file_name in os.listdir("."):
        if file_name.endswith(".dna"):
            dna_file = open(file_name)
    
            # process each DNA sequence in a file
            for line in dna_file:
                dna = line.rstrip("
    ")
    
                # increase the count for each k-mer that we find
                for kmer in split_dna(dna, kmer_size):
                    current_count = kmer_counts.get(kmer, 0)
                    new_count = current_count + 1
                    kmer_counts[kmer] = new_count
    
    # print k-mers whose counts are above the cutoff
    for kmer, count in kmer_counts.items():
        if count > count_cutoff:
            print(kmer + " : " + str(count))
    

    Ref: https://www.cnblogs.com/leezx/p/5577600.html

  • 相关阅读:
    Linux-Zabbix 邮件报警设置
    CentOS6.7 防火墙规则(Iptables)
    CentOS7 防火墙规则 (firewalld)
    windows搭建代理服务器
    Linux服务器的远程IP限制
    利用shell脚本监控目录内文件改动
    CentOS 7, 升级python到3.x
    CentOS 7, Attempting to create directory /root/perl5
    变长参数表
    C语言预处理
  • 原文地址:https://www.cnblogs.com/jessepeng/p/12882606.html
Copyright © 2011-2022 走看看