zoukankan      html  css  js  c++  java
  • 利用GEMMA进行GWAS分析

    1、软件下载,下载地址:https://github.com/genetics-statistics/GEMMA/releases

     下载过程:

    root@PC1:/home/gemma_test# ls
    root@PC1:/home/gemma_test# pwd
    /home/gemma_test
    root@PC1:/home/gemma_test# wget https://github.com/genetics-statistics/GEMMA/releases/download/v0.98.5/gemma-0.98.5-linux-static-AMD64.gz  ## 下载
    --2021-12-25 22:02:46--  https://github.com/genetics-statistics/GEMMA/releases/download/v0.98.5/gemma-0.98.5-linux-static-AMD64.gz
    Resolving github.com (github.com)... 20.205.243.166
    Connecting to github.com (github.com)|20.205.243.166|:443... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/2880217/634e9bdc-e9f8-409d-8354-16874abd27ce?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211225%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211225T140247Z&X-Amz-Expires=300&X-Amz-Signature=33e6f15d35852f87e55df57d93955d7ed3474f3185cae7fa05edbe0347b39d5d&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=2880217&response-content-disposition=attachment%3B%20filename%3Dgemma-0.98.5-linux-static-AMD64.gz&response-content-type=application%2Foctet-stream [following]
    --2021-12-25 22:02:47--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/2880217/634e9bdc-e9f8-409d-8354-16874abd27ce?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211225%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211225T140247Z&X-Amz-Expires=300&X-Amz-Signature=33e6f15d35852f87e55df57d93955d7ed3474f3185cae7fa05edbe0347b39d5d&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=2880217&response-content-disposition=attachment%3B%20filename%3Dgemma-0.98.5-linux-static-AMD64.gz&response-content-type=application%2Foctet-stream
    Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
    Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.108.133|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 6394866 (6.1M) [application/octet-stream]
    Saving to: ‘gemma-0.98.5-linux-static-AMD64.gz’
    
    gemma-0.98.5-linux-static-AMD64.g 100%[==========================================================>]   6.10M   275KB/s    in 22s
    
    2021-12-25 22:03:10 (284 KB/s) - ‘gemma-0.98.5-linux-static-AMD64.gz’ saved [6394866/6394866]
    
    root@PC1:/home/gemma_test# ls
    gemma-0.98.5-linux-static-AMD64.gz
    root@PC1:/home/gemma_test# gunzip gemma-0.98.5-linux-static-AMD64.gz   ## 解压
    root@PC1:/home/gemma_test# ls
    gemma-0.98.5-linux-static-AMD64
    root@PC1:/home/gemma_test# chmod 755 gemma-0.98.5-linux-static-AMD64   ## 赋予执行权限
    root@PC1:/home/gemma_test# ls
    gemma-0.98.5-linux-static-AMD64

    2、调用测试

    root@PC1:/home/gemma_test# ls
    gemma-0.98.5-linux-static-AMD64
    root@PC1:/home/gemma_test# ./gemma-0.98.5-linux-static-AMD64
    GEMMA 0.98.5 (2021-08-25) by Xiang Zhou, Pjotr Prins and team (C) 2012-2021
    
     type ./gemma -h [num] for detailed help
     options:
      1: quick guide
      2: file I/O related
      3: SNP QC
      4: calculate relatedness matrix
      5: perform eigen decomposition
      6: perform variance component estimation
      7: fit a linear model
      8: fit a linear mixed model
      9: fit a multivariate linear mixed model
     10: fit a Bayesian sparse linear mixed model
     11: obtain predicted values
     12: calculate snp variance covariance
     13: note
     14: debug options
    
    The GEMMA software is distributed under the GNU General Public v3
       -license    show license information
       see also http://www.xzlab.org/software.html, https://github.com/genetics-statistics

    3、测试数据下载,地址:https://github.com/genetics-statistics/GEMMA/releases

     

    root@PC1:/home/gemma_test# ls
    2000.bed  2000.bim  2000.fam  gemma-0.98.5-linux-static-AMD64

    4、剔除没有表型记录的数据,gemma只分析有表型记录的数据

    root@PC1:/home/gemma_test# ls
    2000.bed  2000.bim  2000.fam  gemma-0.98.5-linux-static-AMD64
    root@PC1:/home/gemma_test# awk '$6 != "-9"' 2000.fam | awk '{print $1, $2}' > keep.txt   ## 提取fam文件第六列不为-9的数据,然后提取前两列
    root@PC1:/home/gemma_test# head keep.txt
    88 88
    108 108
    139 139
    159 159
    265 265
    350 350
    351 351
    403 403
    410 410
    424 424
    root@PC1:/home/gemma_test# plink --bfile 2000 --keep keep.txt --make-bed --out 2000new   ## 利用plink软件--keep命令提取没有缺失的数据
    PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
    (C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
    Logging to 2000new.log.
    Options in effect:
      --bfile 2000
      --keep keep.txt
      --make-bed
      --out 2000new
    
    15974 MB RAM detected; reserving 7987 MB for main workspace.
    2000 variants loaded from .bim file.
    1008 people (0 males, 0 females, 1008 ambiguous) loaded from .fam.
    Ambiguous sex IDs written to 2000new.nosex .
    876 phenotype values loaded from .fam.
    --keep: 876 people remaining.
    Warning: Ignoring phenotypes of missing-sex samples.  If you don't want those
    phenotypes to be ignored, use the --allow-no-sex flag.
    Using 1 thread (no multithreaded calculations invoked).
    Before main variant filters, 876 founders and 0 nonfounders present.
    Calculating allele frequencies... done.
    Total genotyping rate in remaining samples is 0.973096.
    2000 variants and 876 people pass filters and QC.
    Phenotype data is quantitative.
    --make-bed to 2000new.bed + 2000new.bim + 2000new.fam ... done.
    root@PC1:/home/gemma_test# ls
    2000.bed  2000.fam     2000new.bim  2000new.log    gemma-0.98.5-linux-static-AMD64
    2000.bim  2000new.bed  2000new.fam  2000new.nosex  keep.txt
    root@PC1:/home/gemma_test# wc -l 2000new.*    
         0 2000new.bed
      2000 2000new.bim
       876 2000new.fam
        29 2000new.log
      1008 2000new.nosex
      3913 total
    ## 可见一共有2000个位点, 876个样本

    5、质控

    root@PC1:/home/gemma_test# ls
    2000.bed  2000.fam     2000new.bim  2000new.log    gemma-0.98.5-linux-static-AMD64
    2000.bim  2000new.bed  2000new.fam  2000new.nosex  keep.txt
    root@PC1:/home/gemma_test# plink --bfile 2000new --mind 0.1 --geno 0.05 --maf 0.01 --make-bed --out 2000new2  ## 样本缺失率大于10%提出,位点缺失率大于5%剔除,最小等位基因频率小于1%剔除
    ##gemma软件只会分析位点缺失率小于5%的位点,因此使用--geno 0.05的标准
    PLINK v1.90b6.
    24 64-bit (6 Jun 2021) www.cog-genomics.org/plink/1.9/ (C) 2005-2021 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to 2000new2.log. Options in effect: --bfile 2000new --geno 0.05 --maf 0.01 --make-bed --mind 0.1 --out 2000new2 15974 MB RAM detected; reserving 7987 MB for main workspace. 2000 variants loaded from .bim file. 876 people (0 males, 0 females, 876 ambiguous) loaded from .fam. Ambiguous sex IDs written to 2000new2.nosex . 876 phenotype values loaded from .fam. Warning: Ignoring phenotypes of missing-sex samples. If you don't want those phenotypes to be ignored, use the --allow-no-sex flag. 36 people removed due to missing genotype data (--mind). IDs written to 2000new2.irem . Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 840 founders and 0 nonfounders present. Calculating allele frequencies... done. Total genotyping rate in remaining samples is 0.978071. 15 variants removed due to missing genotype data (--geno). 0 variants removed due to minor allele threshold(s) (--maf/--max-maf/--mac/--max-mac). 1985 variants and 840 people pass filters and QC. Phenotype data is quantitative. --make-bed to 2000new2.bed + 2000new2.bim + 2000new2.fam ... done. root@PC1:/home/gemma_test# ls 2000.bed 2000.fam 2000new2.bim 2000new2.irem 2000new2.nosex 2000new.bim 2000new.log gemma-0.98.5-linux-static-AMD64 2000.bim 2000new2.bed 2000new2.fam 2000new2.log 2000new.bed 2000new.fam 2000new.nosex keep.txt

    6、计算kinship矩阵

    root@PC1:/home/gemma_test# ls
    2000.bed  2000.fam      2000new2.bim  2000new2.irem  2000new2.nosex  2000new.bim  2000new.log    gemma-0.98.5-linux-static-AMD64
    2000.bim  2000new2.bed  2000new2.fam  2000new2.log   2000new.bed     2000new.fam  2000new.nosex  keep.txt
    root@PC1:/home/gemma_test# ./gemma-0.98.5-linux-static-AMD64 -bfile 2000new2 -gk 2 -o kin   ## -gk 2 表示生成的kinship进行标准化(scale),-o指定输出的名称
    GEMMA 0.98.5 (2021-08-25) by Xiang Zhou, Pjotr Prins and team (C) 2012-2021
    Reading Files ...
    ## number of total individuals = 840
    ## number of analyzed individuals = 840
    ## number of covariates = 1
    ## number of phenotypes = 1
    ## number of total SNPs/var        =     1985
    ## number of analyzed SNPs         =     1985
    Calculating Relatedness Matrix ...
    ================================================== 100%
    **** INFO: Done.
    root@PC1:/home/gemma_test# ls
    2000.bed  2000new2.bed  2000new2.irem   2000new.bed  2000new.log                      keep.txt
    2000.bim  2000new2.bim  2000new2.log    2000new.bim  2000new.nosex                    output
    2000.fam  2000new2.fam  2000new2.nosex  2000new.fam  gemma-0.98.5-linux-static-AMD64
    root@PC1:/home/gemma_test# ls output/
    kin.log.txt  kin.sXX.txt

    7、将kinship文件移动至当前目录

    root@PC1:/home/gemma_test# ls
    2000.bed  2000new2.bed  2000new2.irem   2000new.bed  2000new.log                      keep.txt
    2000.bim  2000new2.bim  2000new2.log    2000new.bim  2000new.nosex                    output
    2000.fam  2000new2.fam  2000new2.nosex  2000new.fam  gemma-0.98.5-linux-static-AMD64
    root@PC1:/home/gemma_test# mv ./output/kin.sXX.txt .
    root@PC1:/home/gemma_test# ls
    2000.bed  2000new2.bed  2000new2.irem   2000new.bed  2000new.log                      keep.txt
    2000.bim  2000new2.bim  2000new2.log    2000new.bim  2000new.nosex                    kin.sXX.txt
    2000.fam  2000new2.fam  2000new2.nosex  2000new.fam  gemma-0.98.5-linux-static-AMD64  output

    8、进行GWAS分析

    root@PC1:/home/gemma_test# ls
    2000.bed  2000new2.bed  2000new2.irem   2000new.bed  2000new.log                      keep.txt
    2000.bim  2000new2.bim  2000new2.log    2000new.bim  2000new.nosex                    kin.sXX.txt
    2000.fam  2000new2.fam  2000new2.nosex  2000new.fam  gemma-0.98.5-linux-static-AMD64  output
    root@PC1:/home/gemma_test# ./gemma-0.98.5-linux-static-AMD64 -bfile 2000new2 -k kin.sXX.txt -lmm 1 -o result ## -k 指定kinship文件,
    ##-lmm 进行混合线性模型GWAS分析。选项代表不同的检验方式:1为Wald检验;2为似然比检验;3为score检验;4为以上三个检验全部执行。一般使用wald检验即可
    GEMMA
    0.98.5 (2021-08-25) by Xiang Zhou, Pjotr Prins and team (C) 2012-2021 Reading Files ... ## number of total individuals = 840 ## number of analyzed individuals = 840 ## number of covariates = 1 ## number of phenotypes = 1 ## number of total SNPs/var = 1985 ## number of analyzed SNPs = 1985 Start Eigen-Decomposition... **** WARNING: Matrix G has 3 eigenvalues close to zero pve estimate =0.593312 se(pve) =0.0603374 ================================================== 100% **** INFO: Done. root@PC1:/home/gemma_test# ls 2000.bed 2000new2.bed 2000new2.irem 2000new.bed 2000new.log keep.txt 2000.bim 2000new2.bim 2000new2.log 2000new.bim 2000new.nosex kin.sXX.txt 2000.fam 2000new2.fam 2000new2.nosex 2000new.fam gemma-0.98.5-linux-static-AMD64 output root@PC1:/home/gemma_test# ls output/ kin.log.txt result.assoc.txt result.log.txt root@PC1:/home/gemma_test# head output/result.assoc.txt ## 查看结果文件的前10行,重要信息在 chr、rs、ps、p_wald列 chr rs ps n_miss allele1 allele0 af beta se logl_H1 l_remle p_wald 1 32 363 42 G C 0.065 -3.817377e+00 2.334342e+00 -3.477735e+03 1.462921e+00 1.023592e-01 1 118 1645 38 T G 0.175 -1.646150e+00 1.631679e+00 -3.478488e+03 1.445743e+00 3.133288e-01 1 119 1646 34 G A 0.171 -1.546425e+00 1.614092e+00 -3.478554e+03 1.449508e+00 3.383002e-01 1 134 2309 17 A T 0.281 1.132549e+00 1.905078e+00 -3.478484e+03 1.448278e+00 5.523452e-01 1 135 2310 21 A C 0.282 1.608435e-01 1.908279e+00 -3.478656e+03 1.470582e+00 9.328482e-01 1 144 2554 25 T C 0.058 -1.837053e+00 2.161237e+00 -3.478834e+03 1.461591e+00 3.955674e-01 1 148 2673 17 T C 0.083 -2.581828e+00 1.941161e+00 -3.478247e+03 1.451538e+00 1.838659e-01 1 162 3102 27 G A 0.311 3.694870e+00 1.470007e+00 -3.475745e+03 1.456267e+00 1.214033e-02 1 182 4648 14 A C 0.297 3.429882e+00 1.930098e+00 -3.477053e+03 1.436600e+00 7.592262e-02

    9、使用R绘制曼哈顿图

    ##install.packages("qqman")
    library(qqman)  ## 加载qqman包
    results_log <- read.table("result.assoc.txt", head=TRUE)   ## 读取结果文件
    manhattan(results_log,chr="chr",bp="ps",p="p_wald",snp="rs", ylim = c(0, 10), cex = 2, cex.axis = 1.5, 
      col = 1, suggestiveline = F, genomewideline = F, main = "Manhattan plot")   ## 绘制曼哈顿图

    10、绘制qq图

    qq(results_log$p_wald, main = "Q-Q plot of GWAS p-values", xlim = c(0, 5), ylim = c(0, 
                           6), pch = 18, col = "blue4", cex = 2, las = 1)

    参考:

    https://www.jianshu.com/p/d31404620c9b

    https://www.jianshu.com/p/372ec8585b49

  • 相关阅读:
    012 字典
    011 递归
    010 函数与闭包
    009 格式化
    000 机器学习的概念原理
    008 元组
    007 列表
    005 Numpy的基本操作
    071 SparkStreaming与SparkSQL集成
    070 DStream中的transform和foreachRDD函数
  • 原文地址:https://www.cnblogs.com/liujiaxin2018/p/15731881.html
Copyright © 2011-2022 走看看