zoukankan      html  css  js  c++  java
  • 可变剪切调控因子motif基因富集分析 | motif enrichment | FIMO | MEME

    类似篇:转录因子motif TSS区域富集分析 | motif enrichment | HOMER | FIMO | MEME

    一个新的领域,现在我关注的是可变剪切调控因子,如PTBP1,它们有特定的RNA结合motif,类似TF。

    相同点:

    • 都是蛋白质的序列结合区域
    • 有特定的序列motif

    不同点:

    • TF的motif主要结合在promoter和enhancer,负责基因转录
    • ASF的motif主要结合在gene的intro区域,负责可变剪切

    这里以PTBP1为例。

    灵感来源文章:2018 - cancer cell - PTBP1-Mediated Alternative Splicing Regulates the Inflammatory Secretome and the Pro-tumorigenic Effects of Senescent Cells

    RNA-Binding Motif Analysis
    FIMO (Grant et al., 2011) was used to scan the human gene sequences for the PTBP1 RNA-binding motifs inferred by (Ray et al., 2013). The thereby predicted occurrences were mapped to the analyzed splicing events. To generate the RNA-maps (Figures 7B and S7D), for each comparison alternative exons were divided into those with PSIs significantly increasing upon PTBP1 knockdown (putatively repressed), those with PSIs significantly decreasing upon PTBP1 knockdown (putatively enhanced), and those with PSIs not altered upon PTBP1 knockdown (putatively not regulated). Statistical significance for local motif enrichment is associated with Fisher’s exact tests for differences in motif occurrences between groups of exons within 31 bp moving windows.

    找RNA motif

    查Ray et al., 2013,A compendium of RNA-binding motifs for decoding gene regulation

    顺藤摸瓜,找到一个数据库:CISBP-RNA Database: Catalog of Inferred Sequence Binding Preferences of RNA binding proteins

    操作,导出hg38的gene序列(包含exon和intro)

    http://www.genome.ucsc.edu/cgi-bin/hgTables

    用FIMO预测:https://meme-suite.org/meme/tools/fimo

    得到短序列的motif的meme格式,网页版会给出来,下载即可。

    MEME version 4
    
    ALPHABET= ACGT
    
    strands: + -
    
    Background letter frequencies (from unknown source):
    A 0.250 C 0.250 G 0.250 T 0.250
    
    MOTIF 1 HYTTTYT
    
    letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0
    0.333333 0.333333 0.000000 0.333333
    0.000000 0.500000 0.000000 0.500000
    0.000000 0.000000 0.000000 1.000000
    0.000000 0.000000 0.000000 1.000000
    0.000000 0.000000 0.000000 1.000000
    0.000000 0.500000 0.000000 0.500000
    0.000000 0.000000 0.000000 1.000000
    

      

    fimo --alpha 1 --max-strand -oc target PTBP1.motif.meme hg38_gene.fasta
    

      

    一个小的DNA、RNA、protein转换工具:http://biomodel.uah.es/en/lab/cybertory/analysis/trans.htm

    注意:

    motif与序列要匹配,DNA就是T,RNA就是U,不然无法匹配。

    如果是RNA motif,则需要做一个反向互补的DNA motif

    MEME version 4
    
    ALPHABET= ACGT
    
    strands: + -
    
    Background letter frequencies (from unknown source):
    A 0.250 C 0.250 G 0.250 T 0.250
    
    MOTIF 1 ARAAARD
    
    letter-probability matrix: alength= 4 w= 7 nsites= 1 E= 0e+0
    1.000000 0.000000 0.000000 0.000000
    0.500000 0.000000 0.500000 0.000000
    1.000000 0.000000 0.000000 0.000000
    1.000000 0.000000 0.000000 0.000000
    1.000000 0.000000 0.000000 0.000000
    0.500000 0.000000 0.500000 0.000000
    0.333333 0.000000 0.333333 0.333333
    
    fimo --alpha 1 --max-strand -oc target PTBP1.DNA.motif.meme hg38_gene.fasta --max-stored-scores 1000000 --thresh 1e-4
    

      

    下次要用小数据测试,不然一晚上白跑了。 

    --max-strand

    If matches on both strands at a given position satisfy the output threshold, only report the match for the strand with the higher score. If the scores are tied, the matching strand is chosen at random.

    资源消耗统计

    --max-stored-scores 1000000用到了1.48G内存,1个CPU

    --max-stored-scores 10000000用到了内存,个CPU

    最新命令:

    fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --text --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output.tsv
    
    fimo --max-stored-scores 10000000 --thresh 1e-4 --alpha 1 -oc target2 --skip-matched-sequence --max-strand PTBP1.DNA.motif.meme hg38_gene.fasta > output2.tsv
    

      

    --skip-matched-sequence【超速输出,一个半小时缩短为10分钟】

    Like the --text option, this limits output to tab-separated values (TSV) sent to standard out, but in addition, turns off output of the sequence of motif matches. This speeds up processing considerably.

      

    --text【结果到标准输出】

    Limits output to TSV (tab-separated values) formatted results sent to standard output. The results are unsorted and no q-values are output, allowing very large files to be searched.

    参考:

    ~/project/scPipeline/motifEnrichment/ASF_motif/

  • 相关阅读:
    查询SGA,PGA pool 内存分配情况
    为2229岁的人解释一下什么叫工作
    关于log的一些脚本
    关于ARM公司的cortex系列
    git reset 小结
    git push 小结
    git push 小结
    关于ubuntu的aptget 包
    TTL接口 液晶屏 与 LVDS接口 液晶屏的 区别
    git reset 小结
  • 原文地址:https://www.cnblogs.com/leezx/p/15116660.html
Copyright © 2011-2022 走看看