zoukankan html css js c++ java

SNPsnap | 筛选最佳匹配的SNP | 富集分析 | CP loci

一个矛盾：

GWAS得到的SNP做富集分析的话，通常都会有强的偏向性。

co-localization of GWAS signals to gene-dense and high linkage disequilibrium (LD) regions, and correlations of gene size, location and function

数据库使用注意：

一次最多只能输入200-300个SNP
SNP必须以rs id格式输入，否则基本不识别

SNPsnap: a Web-based tool for identification and annotation of matched SNPs

providing matched sets of SNPs that can be used to calibrate background expectations.

基于：allele frequency, number of SNPs in LD, distance to nearest gene and gene density

根据条件，选出类似的SNP：

Minor allele frequency : we partitioned SNPs into minor allele frequency bins (using 1–2, 2–3, … , 49–50% strata).
LD buddies : for each SNP, we counted the number of ‘buddy’ SNPs in LD at various thresholds (r 2 > 0.1, 0.2, … , 0.9) [using PLINK v.1.07 ( Purcell et al. , 2007 ) to compute LD].
Distance to nearest gene : we computed the distance to the nearest 5′ start site using Ensembl gene coordinates ( Flicek et al. , 2014 ). If the SNP was within a gene, we used the distance to that gene’s start site.
Gene density : we counted the number of genes in loci around the SNP, using LD (r 2 > 0.1, 0.2, … , 0.9) and physical distance (100, 200, … , 1000 kb) to define loci.

这里我们就要根据这个工具来筛选T0的SNP。

a) the number of T0 loci was set to be the same as that of the T1 loci (associated with a single trait);

b) the length distribution of T0 loci was set to be the same as that of the T1 loci;

c) the T0 loci should not include the ENCODE blacklist regions and human leukocyte antigen (HLA) regions; and

d) they should be randomly selected from autosomal regions.

画这个图的脚本：

head=T2
bedfile=../sort.CP.region.T2.bed

# cat CP.region.T0.bed | bedtools sort -g ../genome.txt > sort.CP.region.T0.bed
# cat CP.region.T2.bed | bedtools sort -g ../genome.txt > sort.CP.region.T2.bed
# cat CP.region.T3.bed | bedtools sort -g ../genome.txt > sort.CP.region.T3.bed

bedtools intersect -a ../../UCSC.anno/CDS.bed -b $bedfile -wa | bedtools merge > $head.CDS.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.CDS.bed -wa > $head.CDS.cons.bed &&

bedtools intersect -a ../../UCSC.anno/UTR3.bed -b $bedfile -wa | bedtools merge > $head.UTR3.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR3.bed -wa > $head.UTR3.cons.bed &&

bedtools intersect -a ../../UCSC.anno/UTR5.bed -b $bedfile -wa | bedtools merge > $head.UTR5.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.UTR5.bed -wa > $head.UTR5.cons.bed &&

bedtools intersect -a ../../UCSC.anno/Down2K.bed -b $bedfile -wa | bedtools merge > $head.Down2K.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Down2K.bed -wa > $head.Down2K.cons.bed &&

bedtools intersect -a ../../UCSC.anno/Up2K.bed -b $bedfile -wa | bedtools merge > $head.Up2K.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Up2K.bed -wa > $head.Up2K.cons.bed &&

bedtools intersect -a ../../UCSC.anno/Intron.bed -b $bedfile -wa | bedtools merge > $head.Intron.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.Intron.bed -wa > $head.Intron.cons.bed &&

bedtools intersect -a ../../UCSC.anno/intergenic.bed -b $bedfile -wa | bedtools merge > $head.intergenic.bed &&
bedtools intersect -a ../../PhastCons.bed/all.chr.phastCons46way.primates.bed -b $head.intergenic.bed -wa > $head.intergenic.cons.bed &&

echo done!


# awk '{ total += $4 } END { print total/NR }' T2.CDS.cons.bed

批量求均值

awk '{ total += $4 } END { print total/NR }' T*.CDS.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.UTR3.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.UTR5.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.Down2K.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.Up2K.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.Intron.cons.bed
awk '{ total += $4 } END { print total/NR }' T*.intergenic.cons.bed

按CP loci来分别统计平均分，bedtools的特殊功能

for i in CDS UTR3 UTR5 Down2K Up2K Intron intergenic
do
# bedtools map -a sort.CP.region.T0.bed -b T0/T0.CDS.cons.bed -c 4 -o mean | cut -f4
echo $i
#
# echo $i > CPmerge/$i.T0.score
# bedtools map -a sort.CP.region.T0.bed -b T0/T0.$i.cons.bed -c 4 -o mean | cut -f4 >> CPmerge/$i.T0.score
#
echo $i > CPmerge/$i.T1.score
bedtools map -a sort.CP.region.T1.bed -b T1/T1.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T1.score
#
echo $i > CPmerge/$i.T2.score
bedtools map -a sort.CP.region.T2.bed -b T2/T2.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T2.score
#
echo $i > CPmerge/$i.T00.score
bedtools map -a sort.SNPsnap.bed -b SNPsnap/SNPsnap.$i.cons.bed -c 4 -o mean | cut -f6 >> CPmerge/$i.T00.score
#
done

#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T0.* > T0.score
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T1.* > T1.score
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T2.* > T2.score
#paste ~/project2/CPloci/evo/CP.region/CPmerge/*.T00.* > T00.score

待续

查看全文

相关阅读:
3.css3文字与字体
 2.css3表示颜色的几种方式
 测试ssh转发
 硬盘相关合集，以及LVM操作实践
 linux硬件驱动
 启动nfs清除端口占用过程
 postgres10.2时区研究
 git submodule临时分支；以及git reset使用
 sudo环境变量问题；程序库函数寻找
 linuxI/O重定向

原文地址：https://www.cnblogs.com/leezx/p/11822163.html