zoukankan      html  css  js  c++  java
  • SNPsnap | 筛选最佳匹配的SNP | 富集分析 | CP loci

    一个矛盾:

    GWAS得到的SNP做富集分析的话,通常都会有强的偏向性。

    co-localization of GWAS signals to gene-dense and high linkage disequilibrium (LD) regions, and correlations of gene size, location and function

    数据库使用注意:

    • 一次最多只能输入200-300个SNP
    • SNP必须以rs id格式输入,否则基本不识别

    SNPsnap: a Web-based tool for identification and annotation of matched SNPs 

    providing matched sets of SNPs that can be used to calibrate background expectations.

    基于:allele frequency, number of SNPs in LD, distance to nearest gene and gene density

    根据条件,选出类似的SNP:

    1. Minor allele frequency : we partitioned SNPs into minor allele frequency bins (using 1–2, 2–3, … , 49–50% strata).
    2. LD buddies : for each SNP, we counted the number of ‘buddy’ SNPs in LD at various thresholds (r 2 > 0.1, 0.2, … , 0.9) [using PLINK v.1.07 ( Purcell et al. , 2007 ) to compute LD].
    3. Distance to nearest gene : we computed the distance to the nearest 5′ start site using Ensembl gene coordinates ( Flicek et al. , 2014 ). If the SNP was within a gene, we used the distance to that gene’s start site.
    4. Gene density : we counted the number of genes in loci around the SNP, using LD (r 2 > 0.1, 0.2, … , 0.9) and physical distance (100, 200, … , 1000 kb) to define loci.

    这里我们就要根据这个工具来筛选T0的SNP。

    a) the number of T0 loci was set to be the same as that of the T1 loci (associated with a single trait);

    b) the length distribution of T0 loci was set to be the same as that of the T1 loci;

    c) the T0 loci should not include the ENCODE blacklist regions and human leukocyte antigen (HLA) regions; and

    d) they should be randomly selected from autosomal regions.

    画这个图的脚本:

    head=T2
    bedfile=../sort.CP.region.T2.bed
    
    # cat CP.region.T0.bed | bedtools sort -g ../genome.txt > sort.CP.region.T0.bed
    # cat CP.region.T2.bed | bedtools sort -g ../genome.txt > sort.CP.region.T2.bed
    # cat CP.region.T3.bed | bedtools sort -g ../genome.txt > sort.CP.region.T3.bed
    
    bedtools intersect -a ../../UCSC.anno/CDS.bed -b $bedfile -wa | bedtools merge > $head.CDS.bed &&
    bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.CDS.bed -wa > $head.CDS.cons.bed &&
    
    bedtools intersect -a ../../UCSC.anno/UTR3.bed -b $bedfile -wa | bedtools merge > $head.UTR3.bed &&
    bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR3.bed -wa > $head.UTR3.cons.bed &&
    
    bedtools intersect -a ../../UCSC.anno/UTR5.bed -b $bedfile -wa | bedtools merge > $head.UTR5.bed &&
    bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR5.bed -wa > $head.UTR5.cons.bed &&
    
    bedtools intersect -a ../../UCSC.anno/Down2K.bed -b $bedfile -wa | bedtools merge > $head.Down2K.bed &&
    bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Down2K.bed -wa > $head.Down2K.cons.bed &&
    
    bedtools intersect -a ../../UCSC.anno/Up2K.bed -b $bedfile -wa | bedtools merge > $head.Up2K.bed &&
    bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Up2K.bed -wa > $head.Up2K.cons.bed &&
    
    bedtools intersect -a ../../UCSC.anno/Intron.bed -b $bedfile -wa | bedtools merge > $head.Intron.bed &&
    bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Intron.bed -wa > $head.Intron.cons.bed &&
    
    bedtools intersect -a ../../UCSC.anno/intergenic.bed -b $bedfile -wa | bedtools merge > $head.intergenic.bed &&
    bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.intergenic.bed -wa > $head.intergenic.cons.bed &&
    
    echo done!
    
    
    # awk '{ total += $4 } END { print total/NR }' T2.CDS.cons.bed
    

      

    批量求均值

    awk '{ total += $4 } END { print total/NR }' T*.CDS.cons.bed
    awk '{ total += $4 } END { print total/NR }' T*.UTR3.cons.bed
    awk '{ total += $4 } END { print total/NR }' T*.UTR5.cons.bed
    awk '{ total += $4 } END { print total/NR }' T*.Down2K.cons.bed
    awk '{ total += $4 } END { print total/NR }' T*.Up2K.cons.bed
    awk '{ total += $4 } END { print total/NR }' T*.Intron.cons.bed
    awk '{ total += $4 } END { print total/NR }' T*.intergenic.cons.bed
    

     

    按CP loci来分别统计平均分,bedtools的特殊功能

    for i in CDS UTR3 UTR5 Down2K Up2K Intron intergenic
    do
    # bedtools map -a sort.CP.region.T0.bed -b T0/T0.CDS.cons.bed -c 4 -o mean | cut -f4
    echo $i
    #
    # echo $i > CPmerge/$i.T0.score
    # bedtools map -a sort.CP.region.T0.bed -b T0/T0.$i.cons.bed -c 4 -o mean | cut -f4 >> CPmerge/$i.T0.score
    #
    echo $i > CPmerge/$i.T1.score
    bedtools map -a sort.CP.region.T1.bed -b T1/T1.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T1.score
    #
    echo $i > CPmerge/$i.T2.score
    bedtools map -a sort.CP.region.T2.bed -b T2/T2.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T2.score
    #
    echo $i > CPmerge/$i.T00.score
    bedtools map -a sort.SNPsnap.bed -b SNPsnap/SNPsnap.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T00.score
    #
    done
    
    #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T0.* > T0.score
    #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T1.* > T1.score
    #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T2.* > T2.score
    #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T00.* > T00.score
    

      

     

    待续

  • 相关阅读:
    量化投资:第3节 滑点策略与交易手续费
    量化投资:第2节 择时策略的优化
    量化投资: 第1节 择时策略的开发
    一步一步,完成sparkMLlib对日志文件的处理(1)
    JAVA接口与抽象类区别
    HDU1877 又一版 A+B
    HDU4548 美素数
    超级楼梯 HDU2041
    HDU2013 蟠桃记【递推】
    HDU1262 寻找素数对
  • 原文地址:https://www.cnblogs.com/leezx/p/11822163.html
Copyright © 2011-2022 走看看