zoukankan      html  css  js  c++  java
  • R in action读书笔记(12)第九章 方差分析

    第九章方差分析

    9.2 ANOVA 模型拟合

    9.2.1 aov()函数

    aov(formula, data = NULL, projections =FALSE, qr = TRUE,

    contrasts = NULL, ...)

    9.2.2 表达式中各项的顺序

    y ~ A + B + A:B

    有三种类型的方法可以分解等式右边各效应对y所解释的方差。R默认类型I

    类型I(序贯型)

    效应根据表达式中先出现的效应做调整。A不做调整,B根据A调整,A:B交互项根据A和

    B调整。

    类型II(分层型)

    效应根据同水平或低水平的效应做调整。A根据B调整,B依据A调整,A:B交互项同时根

    据A和B调整。

    类型III(边界型)

    每个效应根据模型其他各效应做相应调整。A根据B和A:B做调整,A:B交互项根据A和B

    调整。

    9.3 单因素方差分析

    > library(multcomp)

    > attach(cholesterol)

    > table(trt) #各组样本大小

    trt

    1time 2times4times drugD drugE

    10 10 10 10 10

    > aggregate(response,by=list(trt),FUN=mean)#各组均值

    Group.1 x

    1 1time 5.78197

    2 2times 9.22497

    3 4times 12.37478

    4 drugD 15.36117

    5 drugE 20.94752

    > aggregate(response,by=list(trt),FUN=sd) #各组标准差

    Group.1 x

    1 1time 2.878113

    2 2times 3.483054

    3 4times 2.923119

    4 drugD 3.454636

    5 drugE 3.345003

    > fit<-aov(response~trt)

    > summary(fit) #检验组间差异(ANOVA)

    Df SumSq Mean Sq F value Pr(>F)

    trt 41351.4 337.8 32.43 9.82e-13 ***

    Residuals 45 468.8 10.4

    ---

    Signif. codes:

    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    > library(gplots)

    > plotmeans(response~trt,xlab="treatment",ylab="response",main="meanplot with 95% CI")

    #绘制各组均值及其置信区间的图形

    > detach(cholesterol)

    9.3.1 多重比较

    TukeyHSD()函数提供了对各组均值差异的成对检验。但要注意TukeyHSD()函数与HH包存在兼容性问题:若载入HH包,TukeyHSD()函数将会失效。使用detach("package::HH")将它从搜寻路径中删除,然后再调用TukeyHSD()

    > detach("package:HH", unload=TRUE)

    > TukeyHSD(fit)

    Tukey multiplecomparisons of means

    95% family-wise confidence level

    Fit: aov(formula = response ~ trt)

    $trt

    diff lwr upr p adj

    2times-1time 3.44300 -0.6582817 7.5442820.1380949

    4times-1time 6.59281 2.4915283 10.6940920.0003542

    drugD-1time 9.57920 5.4779183 13.680482 0.0000003

    drugE-1time 15.16555 11.0642683 19.266832 0.0000000

    4times-2times 3.14981 -0.9514717 7.2510920.2050382

    drugD-2times 6.13620 2.0349183 10.2374820.0009611

    drugE-2times 11.72255 7.6212683 15.8238320.0000000

    drugD-4times 2.98639 -1.1148917 7.0876720.2512446

    drugE-4times 8.57274 4.4714583 12.6740220.0000037

    drugE-drugD 5.58635 1.4850683 9.687632 0.0030633

    > par(las=2)

    > par(mar=c(5,8,4,2))

    > plot(TukeyHSD(fit))

    multcomp包中的glht()函数提供了多重均值比较更为全面的方法,既适用于线性模型,也适用于广义线性模型.

    > library(multcomp)

    > par(mar=c(5,4,6,2))

    > tuk<-glht(fit,linfct=mcp(trt="Tukey"))

    > plot(cld(tuk,level=.5),col="lightgrey")

    9.3.2 评估检验的假设条件

    > library(car)

    > qqPlot(lm(response~trt,data=cholesterol),simulate=TRUE,main="Q-Qplot",labels=FALSE)

    Bartlett检验:

    > bartlett.test(response~trt,data=cholesterol)

    Bartlett test ofhomogeneity of variances

    data: response by trt

    Bartlett's K-squared = 0.5797, df = 4,

    p-value = 0.9653

    方差齐性分析对离群点非常敏感。可利用car包中的outlierTest()函数来检测离群点:

    > library(car)

    > outlierTest(fit)

    No Studentized residuals withBonferonni p < 0.05

    Largest |rstudent|:

    rstudent unadjusted p-value Bonferonni p

    19 2.251149 0.029422 NA

    9.4 单因素协方差分析

    单因素协方差分析(ANCOVA)扩展了单因素方差分析(ANOVA),包含一个或多个定量的

    协变量

    > data(litter,package="multcomp")

    > attach(litter)

    > table(dose)

    dose

    0 5 50 500

    20 19 18 17

    > aggregate(weight,by=list(dose),FUN=mean)

    Group.1 x

    1 0 32.30850

    2 5 29.30842

    3 50 29.86611

    4 500 29.64647

    > fit<-aov(weight~gesttime+dose)

    > summary(fit)

    Df Sum Sq Mean Sq Fvalue Pr(>F)

    gesttime 1 134.3 134.30 8.049 0.00597 **

    dose 3 137.1 45.71 2.739 0.04988 *

    Residuals 69 1151.3 16.69

    ---

    Signif. codes:

    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

    由于使用了协变量,要获取调整的组均值——即去除协变量效应后的组均值。可使

    用effects包中的effects()函数来计算调整的均值:

    > library(effects)

    > effect("dose",fit)

    dose effect

    dose

    0 5 50 500

    32.35367 28.87672 30.56614 29.33460

    对用户定义的对照的多重比较

    > library(multcomp)

    > contrast<-rbind("no drugvs. drug"=c(3,-1,-1,-1))

    > summary(glht(fit,linfct=mcp(dose=contrast)))

    Simultaneous Tests for General LinearHypotheses

    Multiple Comparisons of Means: User-definedContrasts

    Fit: aov(formula = weight ~ gesttime +dose)

    Linear Hypotheses:

    Estimate Std. Error tvalue

    no drug vs. drug == 0 8.284 3.209 2.581

    Pr(>|t|)

    no drug vs. drug == 0 0.012 *

    ---

    Signif. codes:

    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

    (Adjusted p values reported --single-step method)

    9.4.1 评估检验的假设条件

    检验回归斜率的同质性

    > library(multcomp)

    > fit2<-aov(weight~gesttime*dose,data=litter)

    > summary(fit2)

    Df Sum Sq Mean Sq F value Pr(>F)

    gesttime 1 134.3 134.30 8.289 0.00537 **

    dose 3 137.1 45.71 2.821 0.04556 *

    gesttime:dose 3 81.9 27.29 1.684 0.17889

    Residuals 66 1069.4 16.20

    ---

    Signif. codes:

    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

    9.4.2 结果可视化

    HH包中的ancova()函数可以绘制因变量、协变量和因子之间的关系图。

    > library(HH)

    > ancova(weight~gesttime+dose,data=litter)

    Analysis of Variance Table

    Response: weight

    Df Sum Sq Mean Sq F value Pr(>F)

    gesttime 1 134.30 134.304 8.0493 0.005971 **

    dose 3 137.12 45.708 2.7394 0.049883 *

    Residuals 69 1151.27 16.685

    ---

    Signif. codes:

    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

    9.5 双因素方差分析

    双因素ANOVA

    > attach(ToothGrowth)

    > table(supp,dose)

    dose

    supp 0.5 1 2

    OJ 10 10 10

    VC 10 10 10

    > aggregate(len,by=list(supp,dose),FUN=mean)

    Group.1 Group.2 x

    1 OJ 0.5 13.23

    2 VC 0.5 7.98

    3 OJ 1.0 22.70

    4 VC 1.0 16.77

    5 OJ 2.0 26.06

    6 VC 2.0 26.14

    > aggregate(len,by=list(supp,dose),FUN=sd)

    Group.1 Group.2 x

    1 OJ 0.5 4.459709

    2 VC 0.5 2.746634

    3 OJ 1.0 3.910953

    4 VC 1.0 2.515309

    5 OJ 2.0 2.655058

    6 VC 2.0 4.797731

    > fit<-aov(len~supp*dose)

    > summary(fit)

    Df Sum Sq Mean Sq F value Pr(>F)

    supp 1 205.3 205.3 12.317 0.000894 ***

    dose 1 2224.3 2224.3 133.415 < 2e-16 ***

    supp:dose 1 88.9 88.9 5.333 0.024631 *

    Residuals 56 933.6 16.7

    ---

    Signif. codes:

    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

    table语句的预处理表明该设计是均衡设计(各设计单元中样本大小都相同),aggregate

    语句处理可获得各单元的均值和标准差。用summary()函数得到方差分析表,可以看到主效应

    (supp和dose)和交互效应都非常显著。有多种方式对结果进行可视化处理。interaction.plot()函数来展示双因素方差分析的交互效应。

    > interaction.plot(dose,supp,len,type="b",col=c("red","blue"),pch=c(16,18),main="interactionbetween dose and supplement type")

    还可以用gplots包中的plotmeans()函数来展示交互效应。

    > library(gplots)

    > plotmeans(len~interaction(supp,dose,sep=""),connect=list(c(1,3,5),c(2,4,6)),col=c("red","darkgreen"),main="interactionplot with 95%CIs",xlab="treatment and dose combination")

    用HH包中的interaction2wt()函数来可视化结果,图形对任意顺序的因子

    设计的主效应和交互效应都会进行展示

    > library(HH)

    > interaction2wt(len~supp*dose)

    9.6 重复测量方差分析

    含一个组间因子和一个组内因子的重复测量方差分析

    w1b1<-subset(CO2,Treatment=="chilled")

    w1b1

    fit<-aov(uptake~conc*Type+Error(Plant/(conc)),w1b1)

    summary(fit)

    par(las=2)

    par(mar=c(10,4,4,2))

    with(w1b1,interaction.plot(conc,Type,uptake,type="b",col=c("red","blue"),pch=c(16,18),main="InteractionPlot for Plant Type and Concentration"))

    boxplot(uptake~Type*conc,data=w1b1,col=c("gold","green"),

    main="Chilled Quebec andMississippi Plants",

    ylab="Carbon dioxide uptake rateumol/m^2 sec")

    9.7 多元方差分析

    当因变量(结果变量)不止一个时,可用多元方差分析(MANOVA)对它们同时进行分析。

    library(MASS)

    head(UScereal)

    attach(UScereal)

    y<-cbind(calories,fat,sugars)

    aggregate(y,by=list(shelf),FUN=mean)

    cov(y)

    fit<-manova(y~shelf)

    summary(fit)

    summary.aov(fit)

    9.7.1 评估假设检验

    单因素多元方差分析有两个前提假设,一个是多元正态性,一个是方差—协方差矩阵同质性。

    理论补充

    若有一个p*1的多元正态随机向量x,均值为μ,协方差矩阵为Σ,那么x与μ的马氏距离

    的平方服从自由度为p的卡方分布。Q-Q图展示卡方分布的分位数,横纵坐标分别是样本量与

    马氏距离平方值。如果点全部落在斜率为1、截距项为0的直线上,则表明数据服从多元正态

    分布。

    检验多元正态性

    > center<-colMeans(y)

    > n<-nrow(y)

    > p<-ncol(y)

    > cov<-cov(y)

    > d<-mahalanobis(y,center,cov)

    > coord<-qqplot(qchisq(ppoints(n),df=p),d,main="q-qplot assessing multivariate normality",ylab="mahalanobis d2")

    > identify(coord$x,coord$y,labels=row.names(UScereal))

    9.7.2 稳健多元方差分析

    如果多元正态性或者方差—协方差均值假设都不满足,又或者你担心多元离群点,那么可以

    考虑用稳健或非参数版本的MANOVA检验。稳健单因素MANOVA可通过rrcov包中的

    Wilks.test()函数实现。vegan包中的adonis()函数则提供了非参数MANOVA的等同形式。

    > library(rrcov)

    > Wilks.test(y,shelf,method="mcd")

    9.8 用回归来做ANOVA

    > library(multcomp)

    > levels(cholesterol$trt)

    [1] "1time" "2times" "4times""drugD"

    [5] "drugE"

    > fit.aov<-aov(response~trt,data=cholesterol)

    > summary(fit.aov)

    Df Sum Sq Mean Sq F value Pr(>F)

    trt 4 1351.4 337.8 32.43 9.82e-13 ***

    Residuals 45 468.8 10.4

    ---

    Signif. codes:

    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

    用lm()函数拟合同样的模型

    > fit.lm<-lm(response~trt,data=cholesterol)

    > summary(fit.lm)

    Call:

    lm(formula = response ~ trt, data =cholesterol)

    Residuals:

    Min 1Q Median 3Q Max

    -6.5418 -1.9672 -0.0016 1.8901 6.6008

    Coefficients:

    Estimate Std. Error t valuePr(>|t|)

    (Intercept) 5.782 1.021 5.665 9.78e-07 ***

    trt2times 3.443 1.443 2.385 0.0213 *

    trt4times 6.593 1.443 4.568 3.82e-05 ***

    trtdrugD 9.579 1.443 6.637 3.53e-08 ***

    trtdrugE 15.166 1.443 10.507 1.08e-13 ***

    ---

    Signif. codes:

    0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

    Residual standard error: 3.227 on 45degrees of freedom

    Multiple R-squared: 0.7425, AdjustedR-squared: 0.7196

    F-statistic: 32.43 on 4 and 45DF, p-value: 9.819e-13


  • 相关阅读:
    BEM(Block–Element-Modifier)
    http://element.eleme.io/#/zh-CN/component/quickstart
    Commit message 的写法规范。本文介绍Angular 规范(
    好的commit应该长啥样 https://github.com/torvalds/linux/pull/17#issuecomment-5654674
    代码管理
    if you have content fetched asynchronously on pages where SEO is important, SSR might be necessary
    Martin Fowler’s Active Record design pattern.
    The Zen of Python
    Introspection in Python How to spy on your Python objects Guide to Python introspection
    Object-Oriented Metrics: LCOM 内聚性的度量
  • 原文地址:https://www.cnblogs.com/jpld/p/4459158.html
Copyright © 2011-2022 走看看