zoukankan      html  css  js  c++  java
  • 如何获得FPKM/RPKM计算需要的基因长度(考虑exon之间的overlap)

    版权声明:本文为博主原创文章,转载请注明出处

    这里我们跟Cufflinks的原理一致,使用总的外显子长度,并且去除过多的重叠的外显子的部分。使用R语言,输入为基因的GTF文件

    包的安装

    依赖data.table, IRanges,rtracklayer

    install.packages("data.table")
    if (!requireNamespace("BiocManager", quietly = TRUE))
        install.packages("BiocManager")
    BiocManager::install("rtracklayer")
    BiocManager::install("IRanges")
    
    
    

    代码

    library(data.table)
    library("IRanges")
    require("rtracklayer")
    
    hg19 <- readGFF("hg19.gencodev27.gtf")
    anno <- setDT(hg19)
    anno <- anno[type=="exon",]
    setnames(anno,c("seqid","start","end","gene_name","exon_number"),c("Chr","ExonStart","ExonEnd","Gene","Exon_number"))
    #mkdir bin and mean by bin
    Exon_region <- unique(anno[,.(Chr,ExonStart,ExonEnd,Exon_number,Gene)])
    Exon_region <- Exon_region[,{x <- IRanges(ExonStart,ExonEnd);y <- reduce(x); list(ExonStart=y@start,ExonEnd=y@start+y@width-1)},by=.(Gene,Chr)]
    Exon_region[,Exon_num:=1:.N,by=Gene]
    Exon_region <- Exon_region[,.(Chr,ExonStart,ExonEnd,Exon_num,Gene)]
    Exon_len <- Exon_region[,.(ExonLen = ExonEnd - ExonStart + 1),by=.(Exon_num,Gene)]
    gene_len <- Exon_len[,.(Length = sum(ExonLen)),by=Gene]
    
    
    # write out
    fwrite(Exon_region,file="All_hg19gene_exon.bed", sep = "	", col.names = T)
    fwrite(gene_len, file = "All_hg19gene_len.txt", sep = "	", col.names = T)
    ~
    
    

    结果文件

    1. 基因长度文件 链接:https://pan.baidu.com/s/1NtfM_ESyNyaT-kVaKu0MyQ
      提取码:gy88
      复制这段内容后打开百度网盘手机App,操作更方便哦

    2. 合并后的外显子区域文件
      链接:https://pan.baidu.com/s/1-IpuC_2N88Jx9m2g5fCqmA
      提取码:cevo
      复制这段内容后打开百度网盘手机App,操作更方便哦

    参考资料

    https://www.cureffi.org/2013/09/12/counts-vs-fpkms-in-rna-seq/

  • 相关阅读:
    poj 2251 Dungeon Master-搜索进阶-暑假集训
    棋盘问题-POJ 1321
    Popular Cows
    The Factor
    整数解 (hdu 2092
    Strange fuction hdu 2899
    Factors and Multiples
    Trailing Zeroes (III) -;lightoj 1138
    Swap——hdu 2819
    Arithmetic Sequence
  • 原文地址:https://www.cnblogs.com/ywliao/p/12522426.html
Copyright © 2011-2022 走看看