R in action读书笔记（12）第九章方差分析

zoukankan html css js c++ java

R in action读书笔记（12）第九章方差分析

第九章方差分析

9.2 ANOVA 模型拟合

9.2.1 aov()函数

aov(formula, data = NULL, projections =FALSE, qr = TRUE,

contrasts = NULL, ...)

9.2.2 表达式中各项的顺序

y ~ A + B + A:B

有三种类型的方法可以分解等式右边各效应对y所解释的方差。R默认类型I

类型I（序贯型）

效应根据表达式中先出现的效应做调整。A不做调整，B根据A调整，A:B交互项根据A和

B调整。

类型II（分层型）

效应根据同水平或低水平的效应做调整。A根据B调整，B依据A调整，A:B交互项同时根

据A和B调整。

类型III（边界型）

每个效应根据模型其他各效应做相应调整。A根据B和A:B做调整，A:B交互项根据A和B

调整。

9.3 单因素方差分析

> library(multcomp)

> attach(cholesterol)

> table(trt) #各组样本大小

trt

1time 2times4times drugD drugE

10 10 10 10 10

> aggregate(response,by=list(trt),FUN=mean)#各组均值

Group.1 x

1 1time 5.78197

2 2times 9.22497

3 4times 12.37478

4 drugD 15.36117

5 drugE 20.94752

> aggregate(response,by=list(trt),FUN=sd) #各组标准差

Group.1 x

1 1time 2.878113

2 2times 3.483054

3 4times 2.923119

4 drugD 3.454636

5 drugE 3.345003

> fit<-aov(response~trt)

> summary(fit) #检验组间差异（ANOVA）

Df SumSq Mean Sq F value Pr(>F)

trt 41351.4 337.8 32.43 9.82e-13 ***

Residuals 45 468.8 10.4

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> library(gplots)

> plotmeans(response~trt,xlab="treatment",ylab="response",main="meanplot with 95% CI")

#绘制各组均值及其置信区间的图形

> detach(cholesterol)

9.3.1 多重比较

TukeyHSD()函数提供了对各组均值差异的成对检验。但要注意TukeyHSD()函数与HH包存在兼容性问题：若载入HH包，TukeyHSD()函数将会失效。使用detach("package::HH")将它从搜寻路径中删除，然后再调用TukeyHSD()

> detach("package:HH", unload=TRUE)

> TukeyHSD(fit)

Tukey multiplecomparisons of means

95% family-wise confidence level

Fit: aov(formula = response ~ trt)

$trt

diff lwr upr p adj

2times-1time 3.44300 -0.6582817 7.5442820.1380949

4times-1time 6.59281 2.4915283 10.6940920.0003542

drugD-1time 9.57920 5.4779183 13.680482 0.0000003

drugE-1time 15.16555 11.0642683 19.266832 0.0000000

4times-2times 3.14981 -0.9514717 7.2510920.2050382

drugD-2times 6.13620 2.0349183 10.2374820.0009611

drugE-2times 11.72255 7.6212683 15.8238320.0000000

drugD-4times 2.98639 -1.1148917 7.0876720.2512446

drugE-4times 8.57274 4.4714583 12.6740220.0000037

drugE-drugD 5.58635 1.4850683 9.687632 0.0030633

> par(las=2)

> par(mar=c(5,8,4,2))

> plot(TukeyHSD(fit))

multcomp包中的glht()函数提供了多重均值比较更为全面的方法，既适用于线性模型，也适用于广义线性模型.

> library(multcomp)

> par(mar=c(5,4,6,2))

> tuk<-glht(fit,linfct=mcp(trt="Tukey"))

> plot(cld(tuk,level=.5),col="lightgrey")

9.3.2 评估检验的假设条件

> library(car)

> qqPlot(lm(response~trt,data=cholesterol),simulate=TRUE,main="Q-Qplot",labels=FALSE)

Bartlett检验:

> bartlett.test(response~trt,data=cholesterol)

Bartlett test ofhomogeneity of variances

data: response by trt

Bartlett's K-squared = 0.5797, df = 4,

p-value = 0.9653

方差齐性分析对离群点非常敏感。可利用car包中的outlierTest()函数来检测离群点：

> library(car)

> outlierTest(fit)

No Studentized residuals withBonferonni p < 0.05

Largest |rstudent|:

rstudent unadjusted p-value Bonferonni p

19 2.251149 0.029422 NA

9.4 单因素协方差分析

单因素协方差分析（ANCOVA）扩展了单因素方差分析（ANOVA），包含一个或多个定量的

协变量

> data(litter,package="multcomp")

> attach(litter)

> table(dose)

dose

0 5 50 500

20 19 18 17

> aggregate(weight,by=list(dose),FUN=mean)

Group.1 x

1 0 32.30850

2 5 29.30842

3 50 29.86611

4 500 29.64647

> fit<-aov(weight~gesttime+dose)

> summary(fit)

Df Sum Sq Mean Sq Fvalue Pr(>F)

gesttime 1 134.3 134.30 8.049 0.00597 **

dose 3 137.1 45.71 2.739 0.04988 *

Residuals 69 1151.3 16.69

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

由于使用了协变量，要获取调整的组均值——即去除协变量效应后的组均值。可使

用effects包中的effects()函数来计算调整的均值：

> library(effects)

> effect("dose",fit)

dose effect

dose

0 5 50 500

32.35367 28.87672 30.56614 29.33460

对用户定义的对照的多重比较

> library(multcomp)

> contrast<-rbind("no drugvs. drug"=c(3,-1,-1,-1))

> summary(glht(fit,linfct=mcp(dose=contrast)))

Simultaneous Tests for General LinearHypotheses

Multiple Comparisons of Means: User-definedContrasts

Fit: aov(formula = weight ~ gesttime +dose)

Linear Hypotheses:

Estimate Std. Error tvalue

no drug vs. drug == 0 8.284 3.209 2.581

Pr(>|t|)

no drug vs. drug == 0 0.012 *

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

(Adjusted p values reported --single-step method)

9.4.1 评估检验的假设条件

检验回归斜率的同质性

> library(multcomp)

> fit2<-aov(weight~gesttime*dose,data=litter)

> summary(fit2)

Df Sum Sq Mean Sq F value Pr(>F)

gesttime 1 134.3 134.30 8.289 0.00537 **

dose 3 137.1 45.71 2.821 0.04556 *

gesttime:dose 3 81.9 27.29 1.684 0.17889

Residuals 66 1069.4 16.20

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

9.4.2 结果可视化

HH包中的ancova()函数可以绘制因变量、协变量和因子之间的关系图。

> library(HH)

> ancova(weight~gesttime+dose,data=litter)

Analysis of Variance Table

Response: weight

Df Sum Sq Mean Sq F value Pr(>F)

gesttime 1 134.30 134.304 8.0493 0.005971 **

dose 3 137.12 45.708 2.7394 0.049883 *

Residuals 69 1151.27 16.685

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

9.5 双因素方差分析

双因素ANOVA

> attach(ToothGrowth)

> table(supp,dose)

dose

supp 0.5 1 2

OJ 10 10 10

VC 10 10 10

> aggregate(len,by=list(supp,dose),FUN=mean)

Group.1 Group.2 x

1 OJ 0.5 13.23

2 VC 0.5 7.98

3 OJ 1.0 22.70

4 VC 1.0 16.77

5 OJ 2.0 26.06

6 VC 2.0 26.14

> aggregate(len,by=list(supp,dose),FUN=sd)

Group.1 Group.2 x

1 OJ 0.5 4.459709

2 VC 0.5 2.746634

3 OJ 1.0 3.910953

4 VC 1.0 2.515309

5 OJ 2.0 2.655058

6 VC 2.0 4.797731

> fit<-aov(len~supp*dose)

> summary(fit)

Df Sum Sq Mean Sq F value Pr(>F)

supp 1 205.3 205.3 12.317 0.000894 ***

dose 1 2224.3 2224.3 133.415 < 2e-16 ***

supp:dose 1 88.9 88.9 5.333 0.024631 *

Residuals 56 933.6 16.7

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

table语句的预处理表明该设计是均衡设计（各设计单元中样本大小都相同），aggregate

语句处理可获得各单元的均值和标准差。用summary()函数得到方差分析表，可以看到主效应

（supp和dose）和交互效应都非常显著。有多种方式对结果进行可视化处理。interaction.plot()函数来展示双因素方差分析的交互效应。

> interaction.plot(dose,supp,len,type="b",col=c("red","blue"),pch=c(16,18),main="interactionbetween dose and supplement type")

还可以用gplots包中的plotmeans()函数来展示交互效应。

> library(gplots)

> plotmeans(len~interaction(supp,dose,sep=""),connect=list(c(1,3,5),c(2,4,6)),col=c("red","darkgreen"),main="interactionplot with 95%CIs",xlab="treatment and dose combination")

用HH包中的interaction2wt()函数来可视化结果，图形对任意顺序的因子

设计的主效应和交互效应都会进行展示

> library(HH)

> interaction2wt(len~supp*dose)

9.6 重复测量方差分析

含一个组间因子和一个组内因子的重复测量方差分析

w1b1<-subset(CO2,Treatment=="chilled")

w1b1

fit<-aov(uptake~conc*Type+Error(Plant/(conc)),w1b1)

summary(fit)

par(las=2)

par(mar=c(10,4,4,2))

with(w1b1,interaction.plot(conc,Type,uptake,type="b",col=c("red","blue"),pch=c(16,18),main="InteractionPlot for Plant Type and Concentration"))

boxplot(uptake~Type*conc,data=w1b1,col=c("gold","green"),

main="Chilled Quebec andMississippi Plants",

ylab="Carbon dioxide uptake rateumol/m^2 sec")

9.7 多元方差分析

当因变量（结果变量）不止一个时，可用多元方差分析（MANOVA）对它们同时进行分析。

library(MASS)

head(UScereal)

attach(UScereal)

y<-cbind(calories,fat,sugars)

aggregate(y,by=list(shelf),FUN=mean)

cov(y)

fit<-manova(y~shelf)

summary(fit)

summary.aov(fit)

9.7.1 评估假设检验

单因素多元方差分析有两个前提假设，一个是多元正态性，一个是方差—协方差矩阵同质性。

理论补充

若有一个p*1的多元正态随机向量x，均值为μ，协方差矩阵为Σ，那么x与μ的马氏距离

的平方服从自由度为p的卡方分布。Q-Q图展示卡方分布的分位数，横纵坐标分别是样本量与

马氏距离平方值。如果点全部落在斜率为1、截距项为0的直线上，则表明数据服从多元正态

分布。

检验多元正态性

> center<-colMeans(y)

> n<-nrow(y)

> p<-ncol(y)

> cov<-cov(y)

> d<-mahalanobis(y,center,cov)

> coord<-qqplot(qchisq(ppoints(n),df=p),d,main="q-qplot assessing multivariate normality",ylab="mahalanobis d2")

> identify(coord$x,coord$y,labels=row.names(UScereal))

9.7.2 稳健多元方差分析

如果多元正态性或者方差—协方差均值假设都不满足，又或者你担心多元离群点，那么可以

考虑用稳健或非参数版本的MANOVA检验。稳健单因素MANOVA可通过rrcov包中的

Wilks.test()函数实现。vegan包中的adonis()函数则提供了非参数MANOVA的等同形式。

> library(rrcov)

> Wilks.test(y,shelf,method="mcd")

9.8 用回归来做ANOVA

> library(multcomp)

> levels(cholesterol$trt)

[1] "1time" "2times" "4times""drugD"

[5] "drugE"

> fit.aov<-aov(response~trt,data=cholesterol)

> summary(fit.aov)

Df Sum Sq Mean Sq F value Pr(>F)

trt 4 1351.4 337.8 32.43 9.82e-13 ***

Residuals 45 468.8 10.4

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

用lm()函数拟合同样的模型

> fit.lm<-lm(response~trt,data=cholesterol)

> summary(fit.lm)

Call:

lm(formula = response ~ trt, data =cholesterol)

Residuals:

Min 1Q Median 3Q Max

-6.5418 -1.9672 -0.0016 1.8901 6.6008

Coefficients:

Estimate Std. Error t valuePr(>|t|)

(Intercept) 5.782 1.021 5.665 9.78e-07 ***

trt2times 3.443 1.443 2.385 0.0213 *

trt4times 6.593 1.443 4.568 3.82e-05 ***

trtdrugD 9.579 1.443 6.637 3.53e-08 ***

trtdrugE 15.166 1.443 10.507 1.08e-13 ***

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’0.1 ‘ ’ 1

Residual standard error: 3.227 on 45degrees of freedom

Multiple R-squared: 0.7425, AdjustedR-squared: 0.7196

F-statistic: 32.43 on 4 and 45DF, p-value: 9.819e-13

查看全文

相关阅读:
107. Binary Tree Level Order Traversal II
103. Binary Tree Zigzag Level Order Traversal
102. Binary Tree Level Order Traversal
690. Employee Importance
1723. Find Minimum Time to Finish All Jobs
LeetCode 329 矩阵中最长增长路径
 7.2 物理内存管理
 LeetCode 面试题特定深度节点链表
 LeetCode 100 相同的树
 npm安装包命令详解，dependencies与devDependencies实际区别

原文地址：https://www.cnblogs.com/jpld/p/4459158.html

R in action读书笔记（12）第九章 方差分析

R in action读书笔记（12）第九章方差分析