zoukankan      html  css  js  c++  java
  • plink 阈性状(质量性状)GWAS分析

    plink进行阈性状(质量性状)关联分析有两种方法

    --assoc 和 --logistic,

    这两种方法又分别有分是否校正:

    --assoc --adjust 和 --logistic --adjust

    --logistic方法有可以选择是否添加协变量:

    --logistic --covar 和 --logistic --adjust --covar

    这么算的话一共有6种组合。

    1、测试数据

    root@PC1:/home/test3# ls
    qc3.map  qc3.ped
    root@PC1:/home/test3# wc -l *   ## 288样本, 417105位点
       417105 qc3.map
          288 qc3.ped
       426702 total
    root@PC1:/home/test3# head -n 5 qc3.map
    1       1:3174  0       3174
    1       1:32399 0       32399
    1       1:32402 0       32402
    1       1:32406 0       32406
    1       1:32443 0       32443
    root@PC1:/home/test3# cut -f 1-10 qc3.ped | head -n 5      ## ped文件的第5列性别,第6列是表型数据,1或者2
    1       1       0       0       1       1       T       C       T       C
    2       2       0       0       1       1       C       C       C       C
    3       3       0       0       1       1       C       C       C       C
    4       4       0       0       1       2       C       C       C       C
    5       5       0       0       1       1       C       C       C       C

    2、 --assoc

    root@PC1:/home/test3# ls
    qc3.map  qc3.ped
    root@PC1:/home/test3# plink --file qc3 --assoc --out test    ## 直接使用assoc命令
    PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
    (C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
    Logging to test.log.
    Options in effect:
      --assoc
      --file qc3
      --out test
    
    15974 MB RAM detected; reserving 7987 MB for main workspace.
    .ped scan complete (for binary autoconversion).
    Performing single-pass .bed write (417105 variants, 288 people).
    --file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
    417105 variants loaded from .bim file.
    288 people (288 males, 0 females) loaded from .fam.
    288 phenotype values loaded from .fam.
    Using 1 thread (no multithreaded calculations invoked).
    Before main variant filters, 288 founders and 0 nonfounders present.
    Calculating allele frequencies... done.
    Total genotyping rate is 0.948564.
    417105 variants and 288 people pass filters and QC.
    Among remaining phenotypes, 75 are cases and 213 are controls.
    Writing C/C --assoc report to test.assoc ... done.
    root@PC1:/home/test3# ls
    qc3.map  qc3.ped  test.assoc  test.log
    root@PC1:/home/test3# wc -l *
       417105 qc3.map
          288 qc3.ped
       417106 test.assoc
           27 test.log
       834526 total
    root@PC1:/home/test3# head -n 3 test.assoc  ## 查看结果
     CHR         SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR
       1      1:3174       3174    T   0.1351   0.1332    C     0.003612       0.9521        1.017
       1     1:32399      32399    T   0.1324   0.1288    C       0.0114        0.915        1.032

    3、--assoc --adjust命令

    root@PC1:/home/test3# ls
    qc3.map  qc3.ped
    root@PC1:/home/test3# plink --file qc3 --assoc --adjust --out test   ## 直接使用--assoc  --adjust
    PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
    (C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
    Logging to test.log.
    Options in effect:
      --adjust
      --assoc
      --file qc3
      --out test
    
    15974 MB RAM detected; reserving 7987 MB for main workspace.
    .ped scan complete (for binary autoconversion).
    Performing single-pass .bed write (417105 variants, 288 people).
    --file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
    417105 variants loaded from .bim file.
    288 people (288 males, 0 females) loaded from .fam.
    288 phenotype values loaded from .fam.
    Using 1 thread (no multithreaded calculations invoked).
    Before main variant filters, 288 founders and 0 nonfounders present.
    Calculating allele frequencies... done.
    Total genotyping rate is 0.948564.
    417105 variants and 288 people pass filters and QC.
    Among remaining phenotypes, 75 are cases and 213 are controls.
    Writing C/C --assoc report to test.assoc ... done.
    --adjust: Genomic inflation est. lambda (based on median chisq) = 1.18741.
    --adjust values (417105 variants) written to test.assoc.adjusted .
    root@PC1:/home/test3# ls
    qc3.map  qc3.ped  test.assoc  test.assoc.adjusted  test.log
    root@PC1:/home/test3# wc -l *
       417105 qc3.map
          288 qc3.ped
       417106 test.assoc
       417106 test.assoc.adjusted
           30 test.log
      1251635 total
    root@PC1:/home/test3# head -n 3 test.assoc    ## 查看结果
     CHR         SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR
       1      1:3174       3174    T   0.1351   0.1332    C     0.003612       0.9521        1.017
       1     1:32399      32399    T   0.1324   0.1288    C       0.0114        0.915        1.032
    root@PC1:/home/test3# head -n 3 test.assoc.adjusted   ## 查看结果
     CHR         SNP      UNADJ         GC       BONF       HOLM   SIDAK_SS   SIDAK_SD     FDR_BH     FDR_BY
      20 20:11254823   7.14e-48  1.357e-40  2.978e-42  2.978e-42        INF        INF  1.489e-42  2.013e-41
      20 20:11269211   7.14e-48  1.357e-40  2.978e-42  2.978e-42        INF        INF  1.489e-42  2.013e-41

    4、--logistic

    root@PC1:/home/test3# ls
    qc3.map  qc3.ped
    root@PC1:/home/test3# plink --file qc3 --logistic --out test    ## 直接使用logistic
    PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
    (C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
    Logging to test.log.
    Options in effect:
      --file qc3
      --logistic
      --out test
    
    15974 MB RAM detected; reserving 7987 MB for main workspace.
    .ped scan complete (for binary autoconversion).
    Performing single-pass .bed write (417105 variants, 288 people).
    --file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
    417105 variants loaded from .bim file.
    288 people (288 males, 0 females) loaded from .fam.
    288 phenotype values loaded from .fam.
    Using 1 thread (no multithreaded calculations invoked).
    Before main variant filters, 288 founders and 0 nonfounders present.
    Calculating allele frequencies... done.
    Total genotyping rate is 0.948564.
    417105 variants and 288 people pass filters and QC.
    Among remaining phenotypes, 75 are cases and 213 are controls.
    Writing logistic model association results to test.assoc.logistic ... done.
    root@PC1:/home/test3# ls
    qc3.map  qc3.ped  test.assoc.logistic  test.log
    root@PC1:/home/test3# wc -l *
       417105 qc3.map
          288 qc3.ped
       417106 test.assoc.logistic
           27 test.log
       834526 total
    root@PC1:/home/test3# head test.assoc.logistic   ## 查看结果
     CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P
       1      1:3174       3174    T        ADD      273      1.019      0.06304       0.9497
       1     1:32399      32399    T        ADD      266      1.029       0.1025       0.9183
       1     1:32402      32402    C        ADD      267     0.9198      -0.4321       0.6657
       1     1:32406      32406    C        ADD      267     0.8815      -0.6626       0.5076
       1     1:32443      32443    C        ADD      268       1.16       0.6713        0.502
       1     1:32548      32548    T        ADD      269      1.038        0.147       0.8831
       1     1:45044      45044    A        ADD      286     0.8389      -0.7204       0.4713
       1     1:45047      45047    T        ADD      283     0.8288      -0.6577       0.5107
       1     1:45085      45085    C        ADD      283      1.007      0.03192       0.9745

    5、--logistic --adjust命令

    root@PC1:/home/test3# ls
    qc3.map  qc3.ped
    root@PC1:/home/test3#
    root@PC1:/home/test3#
    root@PC1:/home/test3# ls
    qc3.map  qc3.ped
    root@PC1:/home/test3# plink --file qc3 --logistic --adjust --out test   ## 直接运行
    PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
    (C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
    Logging to test.log.
    Options in effect:
      --adjust
      --file qc3
      --logistic
      --out test
    
    15974 MB RAM detected; reserving 7987 MB for main workspace.
    .ped scan complete (for binary autoconversion).
    Performing single-pass .bed write (417105 variants, 288 people).
    --file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
    417105 variants loaded from .bim file.
    288 people (288 males, 0 females) loaded from .fam.
    288 phenotype values loaded from .fam.
    Using 1 thread (no multithreaded calculations invoked).
    Before main variant filters, 288 founders and 0 nonfounders present.
    Calculating allele frequencies... done.
    Total genotyping rate is 0.948564.
    417105 variants and 288 people pass filters and QC.
    Among remaining phenotypes, 75 are cases and 213 are controls.
    Writing logistic model association results to test.assoc.logistic ... done.
    --adjust: Genomic inflation est. lambda (based on median chisq) = 1.12802.
    --adjust values (417105 variants) written to test.assoc.logistic.adjusted .
    root@PC1:/home/test3# ls  ## 查看生成的结果
    qc3.map  qc3.ped  test.assoc.logistic  test.assoc.logistic.adjusted  test.log
    root@PC1:/home/test3# wc -l *
       417105 qc3.map
          288 qc3.ped
       417106 test.assoc.logistic
       417106 test.assoc.logistic.adjusted
           30 test.log
      1251635 total
    root@PC1:/home/test3# head -n 3 test.assoc.logistic   ## 查看结果
     CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P
       1      1:3174       3174    T        ADD      273      1.019      0.06304       0.9497
       1     1:32399      32399    T        ADD      266      1.029       0.1025       0.9183
    root@PC1:/home/test3# head -n 3 test.assoc.logistic.adjusted    ## 查看结果
     CHR         SNP      UNADJ         GC       BONF       HOLM   SIDAK_SS   SIDAK_SD     FDR_BH     FDR_BY
      20 20:11020547  1.411e-23  4.399e-21  5.887e-18  5.887e-18        INF        INF  5.887e-18  7.959e-17
      20 20:11403719   4.34e-22  9.205e-20   1.81e-16   1.81e-16   2.22e-16   2.22e-16  9.052e-17  1.224e-15

    6、加入协变量, 如何生成协变量

    step1:

    root@PC1:/home/test3# ls
    qc3.map  qc3.ped
    root@PC1:/home/test3# plink --file qc3 --genome --out cov   ## 直接使用--genome命令
    PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
    (C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
    Logging to cov.log.
    Options in effect:
      --file qc3
      --genome
      --out cov
    
    15974 MB RAM detected; reserving 7987 MB for main workspace.
    .ped scan complete (for binary autoconversion).
    Performing single-pass .bed write (417105 variants, 288 people).
    --file: cov-temporary.bed + cov-temporary.bim + cov-temporary.fam written.
    417105 variants loaded from .bim file.
    288 people (288 males, 0 females) loaded from .fam.
    288 phenotype values loaded from .fam.
    Using up to 8 threads (change this with --threads).
    Before main variant filters, 288 founders and 0 nonfounders present.
    Calculating allele frequencies... done.
    Total genotyping rate is 0.948564.
    417105 variants and 288 people pass filters and QC.
    Among remaining phenotypes, 75 are cases and 213 are controls.
    IBD calculations complete.
    Finished writing cov.genome .
    root@PC1:/home/test3# ls
    cov.genome  cov.log  qc3.map  qc3.ped

    step2:

    root@PC1:/home/test3# ls
    cov.genome  cov.log  qc3.map  qc3.ped
    root@PC1:/home/test3# plink --file qc3 --read-genome cov.genome --cluster --mds-plot 10 --out cov2 
    PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
    (C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
    Logging to cov2.log.
    Options in effect:
      --cluster
      --file qc3
      --mds-plot 10
      --out cov2
      --read-genome cov.genome
    
    15974 MB RAM detected; reserving 7987 MB for main workspace.
    .ped scan complete (for binary autoconversion).
    Performing single-pass .bed write (417105 variants, 288 people).
    --file: cov2-temporary.bed + cov2-temporary.bim + cov2-temporary.fam written.
    417105 variants loaded from .bim file.
    288 people (288 males, 0 females) loaded from .fam.
    288 phenotype values loaded from .fam.
    Using 1 thread (no multithreaded calculations invoked).
    Before main variant filters, 288 founders and 0 nonfounders present.
    Calculating allele frequencies... done.
    Total genotyping rate is 0.948564.
    417105 variants and 288 people pass filters and QC.
    Among remaining phenotypes, 75 are cases and 213 are controls.
    Clustering... done.
    Cluster solution written to cov2.cluster1 , cov2.cluster2 , and cov2.cluster3 .
    Performing multidimensional scaling analysis (SVD algorithm, 10
    dimensions)... done.
    MDS solution written to cov2.mds .
    root@PC1:/home/test3# ls  ## 查看结果 cov2.mds为下一步要使用的结果
    cov2.cluster1  cov2.cluster2  cov2.cluster3  cov2.log  cov2.mds  cov.genome  cov.log  qc3.map  qc3.ped

    step3:

    root@PC1:/home/test3# ls
    cov2.cluster1  cov2.cluster2  cov2.cluster3  cov2.log  cov2.mds  cov.genome  cov.log  qc3.map  qc3.ped
    root@PC1:/home/test3# awk '{print$1, $2, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13}' cov2.mds > cov3.txt  ## 整理格式
    root@PC1:/home/test3# ls
    cov2.cluster1  cov2.cluster2  cov2.cluster3  cov2.log  cov2.mds  cov3.txt  cov.genome  cov.log  qc3.map  qc3.ped
    root@PC1:/home/test3# wc -l cov3.txt
    289 cov3.txt
    root@PC1:/home/test3# head -n 3 cov3.txt  ## 查看格式
    FID IID C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
    1 1 0.0111045 -0.0260147 0.0157171 -0.0128374 -0.00871872 0.015319 -0.0206926 0.0229361 -0.0262635 0.0180982
    2 2 -0.00213807 0.0639865 -0.0298421 -0.0296622 0.0138335 0.057727 -0.0341887 -0.0108086 0.0270301 0.0114068

    8、--logistic --covar

    root@PC1:/home/test3# ls
    cov3.txt  qc3.map  qc3.ped
    root@PC1:/home/test3# plink --file qc3 --covar cov3.txt --logistic --hide-covar --out test  ##添加协变量
    PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
    (C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
    Logging to test.log.
    Options in effect:
      --covar cov3.txt
      --file qc3
      --hide-covar
      --logistic
      --out test
    
    Note: --hide-covar flag deprecated.  Use e.g. "--linear hide-covar".
    15974 MB RAM detected; reserving 7987 MB for main workspace.
    .ped scan complete (for binary autoconversion).
    Performing single-pass .bed write (417105 variants, 288 people).
    --file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
    417105 variants loaded from .bim file.
    288 people (288 males, 0 females) loaded from .fam.
    288 phenotype values loaded from .fam.
    Using 1 thread (no multithreaded calculations invoked).
    --covar: 10 covariates loaded.
    Before main variant filters, 288 founders and 0 nonfounders present.
    Calculating allele frequencies... done.
    Total genotyping rate is 0.948564.
    417105 variants and 288 people pass filters and QC.
    Among remaining phenotypes, 75 are cases and 213 are controls.
    Writing logistic model association results to test.assoc.logistic ... done.
    root@PC1:/home/test3# ls  ## 查看生成结果
    cov3.txt  qc3.map  qc3.ped  test.assoc.logistic  test.log
    root@PC1:/home/test3# wc -l *
          289 cov3.txt
       417105 qc3.map
          288 qc3.ped
       417106 test.assoc.logistic
           31 test.log
       834819 total
    root@PC1:/home/test3# head test.assoc.logistic -n 3   ## 查看结果
     CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P
       1      1:3174       3174    T        ADD      273     0.9015      -0.3232       0.7465
       1     1:32399      32399    T        ADD      266     0.9416      -0.1934       0.8467

    9、--logistic --adjust --covar

    root@PC1:/home/test3# ls
    cov3.txt  qc3.map  qc3.ped
    root@PC1:/home/test3# plink --file qc3 --covar cov3.txt --adjust --logistic --hide-covar --out test
    PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
    (C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
    Logging to test.log.
    Options in effect:
      --adjust
      --covar cov3.txt
      --file qc3
      --hide-covar
      --logistic
      --out test
    
    Note: --hide-covar flag deprecated.  Use e.g. "--linear hide-covar".
    15974 MB RAM detected; reserving 7987 MB for main workspace.
    .ped scan complete (for binary autoconversion).
    Performing single-pass .bed write (417105 variants, 288 people).
    --file: test-temporary.bed + test-temporary.bim + test-temporary.fam written.
    417105 variants loaded from .bim file.
    288 people (288 males, 0 females) loaded from .fam.
    288 phenotype values loaded from .fam.
    Using 1 thread (no multithreaded calculations invoked).
    --covar: 10 covariates loaded.
    Before main variant filters, 288 founders and 0 nonfounders present.
    Calculating allele frequencies... done.
    Total genotyping rate is 0.948564.
    417105 variants and 288 people pass filters and QC.
    Among remaining phenotypes, 75 are cases and 213 are controls.
    Writing logistic model association results to test.assoc.logistic ... done.
    --adjust: Genomic inflation est. lambda (based on median chisq) = 1.17784.
    --adjust values (417105 variants) written to test.assoc.logistic.adjusted .
    root@PC1:/home/test3# ls   ### 查看生成结果
    cov3.txt  qc3.map  qc3.ped  test.assoc.logistic  test.assoc.logistic.adjusted  test.log
    root@PC1:/home/test3# wc -l *
          289 cov3.txt
       417105 qc3.map
          288 qc3.ped
       417106 test.assoc.logistic
       417106 test.assoc.logistic.adjusted
           34 test.log
      1251928 total
    root@PC1:/home/test3# head test.assoc.logistic -n 3  ## 查看结果
     CHR         SNP         BP   A1       TEST    NMISS         OR         STAT            P
       1      1:3174       3174    T        ADD      273     0.9015      -0.3232       0.7465
       1     1:32399      32399    T        ADD      266     0.9416      -0.1934       0.8467
    root@PC1:/home/test3# head test.assoc.logistic.adjusted -n 3  ## 查看结果
     CHR         SNP      UNADJ         GC       BONF       HOLM   SIDAK_SS   SIDAK_SD     FDR_BH     FDR_BY
      20 20:11403719  7.848e-19  3.194e-16  3.274e-13  3.274e-13  3.274e-13  3.274e-13  3.274e-13  4.425e-12
      20 20:11020547  2.313e-18  8.011e-16  9.646e-13  9.646e-13  9.646e-13  9.646e-13   3.76e-13  5.082e-12

    参考:

    (1)https://github.com/MareesAT/GWA_tutorial/

    (2)https://blog.csdn.net/yijiaobani/article/details/105294261

  • 相关阅读:
    Compiler Warning C4150: deletion of pointer to incomplete type 'XXX'; no destructor called
    What happend: Exception throws in the .ctor()?
    FocusScope学习一: Logic Focus与Keyboard Focus
    线性筛prime,强大O(n)
    网络流24题方格取数
    splay(1区间翻转区间最值与区间修改)
    排列组合容斥原理
    错排思路
    splay2(区间修改+内存回收)
    DP_1d1d诗人小G
  • 原文地址:https://www.cnblogs.com/liujiaxin2018/p/15704168.html
Copyright © 2011-2022 走看看