一个矛盾:
GWAS得到的SNP做富集分析的话,通常都会有强的偏向性。
co-localization of GWAS signals to gene-dense and high linkage disequilibrium (LD) regions, and correlations of gene size, location and function
数据库使用注意:
- 一次最多只能输入200-300个SNP
- SNP必须以rs id格式输入,否则基本不识别
SNPsnap: a Web-based tool for identification and annotation of matched SNPs
providing matched sets of SNPs that can be used to calibrate background expectations.
基于:allele frequency, number of SNPs in LD, distance to nearest gene and gene density
根据条件,选出类似的SNP:
- Minor allele frequency : we partitioned SNPs into minor allele frequency bins (using 1–2, 2–3, … , 49–50% strata).
- LD buddies : for each SNP, we counted the number of ‘buddy’ SNPs in LD at various thresholds (r 2 > 0.1, 0.2, … , 0.9) [using PLINK v.1.07 ( Purcell et al. , 2007 ) to compute LD].
- Distance to nearest gene : we computed the distance to the nearest 5′ start site using Ensembl gene coordinates ( Flicek et al. , 2014 ). If the SNP was within a gene, we used the distance to that gene’s start site.
- Gene density : we counted the number of genes in loci around the SNP, using LD (r 2 > 0.1, 0.2, … , 0.9) and physical distance (100, 200, … , 1000 kb) to define loci.
这里我们就要根据这个工具来筛选T0的SNP。
a) the number of T0 loci was set to be the same as that of the T1 loci (associated with a single trait);
b) the length distribution of T0 loci was set to be the same as that of the T1 loci;
c) the T0 loci should not include the ENCODE blacklist regions and human leukocyte antigen (HLA) regions; and
d) they should be randomly selected from autosomal regions.
画这个图的脚本:
head=T2 bedfile=../sort.CP.region.T2.bed # cat CP.region.T0.bed | bedtools sort -g ../genome.txt > sort.CP.region.T0.bed # cat CP.region.T2.bed | bedtools sort -g ../genome.txt > sort.CP.region.T2.bed # cat CP.region.T3.bed | bedtools sort -g ../genome.txt > sort.CP.region.T3.bed bedtools intersect -a ../../UCSC.anno/CDS.bed -b $bedfile -wa | bedtools merge > $head.CDS.bed && bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.CDS.bed -wa > $head.CDS.cons.bed && bedtools intersect -a ../../UCSC.anno/UTR3.bed -b $bedfile -wa | bedtools merge > $head.UTR3.bed && bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR3.bed -wa > $head.UTR3.cons.bed && bedtools intersect -a ../../UCSC.anno/UTR5.bed -b $bedfile -wa | bedtools merge > $head.UTR5.bed && bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR5.bed -wa > $head.UTR5.cons.bed && bedtools intersect -a ../../UCSC.anno/Down2K.bed -b $bedfile -wa | bedtools merge > $head.Down2K.bed && bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Down2K.bed -wa > $head.Down2K.cons.bed && bedtools intersect -a ../../UCSC.anno/Up2K.bed -b $bedfile -wa | bedtools merge > $head.Up2K.bed && bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Up2K.bed -wa > $head.Up2K.cons.bed && bedtools intersect -a ../../UCSC.anno/Intron.bed -b $bedfile -wa | bedtools merge > $head.Intron.bed && bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Intron.bed -wa > $head.Intron.cons.bed && bedtools intersect -a ../../UCSC.anno/intergenic.bed -b $bedfile -wa | bedtools merge > $head.intergenic.bed && bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.intergenic.bed -wa > $head.intergenic.cons.bed && echo done! # awk '{ total += $4 } END { print total/NR }' T2.CDS.cons.bed
批量求均值
awk '{ total += $4 } END { print total/NR }' T*.CDS.cons.bed awk '{ total += $4 } END { print total/NR }' T*.UTR3.cons.bed awk '{ total += $4 } END { print total/NR }' T*.UTR5.cons.bed awk '{ total += $4 } END { print total/NR }' T*.Down2K.cons.bed awk '{ total += $4 } END { print total/NR }' T*.Up2K.cons.bed awk '{ total += $4 } END { print total/NR }' T*.Intron.cons.bed awk '{ total += $4 } END { print total/NR }' T*.intergenic.cons.bed
按CP loci来分别统计平均分,bedtools的特殊功能
for i in CDS UTR3 UTR5 Down2K Up2K Intron intergenic do # bedtools map -a sort.CP.region.T0.bed -b T0/T0.CDS.cons.bed -c 4 -o mean | cut -f4 echo $i # # echo $i > CPmerge/$i.T0.score # bedtools map -a sort.CP.region.T0.bed -b T0/T0.$i.cons.bed -c 4 -o mean | cut -f4 >> CPmerge/$i.T0.score # echo $i > CPmerge/$i.T1.score bedtools map -a sort.CP.region.T1.bed -b T1/T1.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T1.score # echo $i > CPmerge/$i.T2.score bedtools map -a sort.CP.region.T2.bed -b T2/T2.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T2.score # echo $i > CPmerge/$i.T00.score bedtools map -a sort.SNPsnap.bed -b SNPsnap/SNPsnap.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T00.score # done #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T0.* > T0.score #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T1.* > T1.score #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T2.* > T2.score #paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T00.* > T00.score
待续