zoukankan html css js c++ java

R语言分析协变量之间的非线性关系

原文链接：http://tecdat.cn/?p=6366

最近我被问到我的R和Stata软件包是否能够适应协变量之间的非线性关系。答案是肯定的，在这篇文章中，我将说明如何做到这一点。

为了说明，我们将模拟具有两个协变量x1和x2以及连续结果y的非常大的数据集。

 

set.seed（123）
n < -  10000 
x1 < -  rnorm（n）
x2 < -  x1 ^ 2 + rnorm（n）
y < -  x1 + x2 + rnorm（n）
 [（runif（n）<expit（y））] < -  NA 
mydata < -  data.frame（x1， X2，Y）

因此，模型的真实系数是0（截距）。注意，实体模型中没有非线性，但x2对x1的依赖性存在非线性。

imps1 < -   （mydata，smtype =“lm” ，
                numit = 50，method = c（“”，“norm”，“”））

impobj < -  imputationList（imps1 $ impDatasets）

输出：

[1] "Outcome variable(s): y"
[1] "Passive variables: "
[1] "Partially obs. variables: x2"
[1] "Fully obs. substantive model variables: x1"
[1] "Imputation  1"
[1] "Imputing:  x2  using  x1,x1sq  plus outcome"
[1] "Imputation  2"
[1] "Imputation  3"
[1] "Imputation  4"
[1] "Imputation  5"
Warning message:
In smcfcs.core(originaldata, smtype, smformula, method, predictorMatrix,  :
  Rejection sampling failed 503 times (across all variables, iterations, and imputations). You may want to increase the rejection sampling limit.

Multiple imputation results:
      with(impobj, lm(y ~ x1 + x2))
      MIcombine.default(models)
               results          se      (lower      upper) missInfo
(Intercept) -0.0274234 0.015746687 -0.06054163 0.005694823     53 %
x1           1.0075646 0.018740270  0.96407720 1.051052088     77 %
x2           1.0026004 0.008043873  0.98549090 1.019709850     56 %

我们看到x1的截距和系数的估计有明显的偏差。假设x2遵循以x1为条件的线性回归模型，smcfcs正在估算x2中的缺失值，条件均值在x1中是线性的。这样做意味着x2平方会在x2的插补模型中自动调整：

mydata $ x1sq < -  mydata $ x1 ^ 2 
imps2 < -   （mydata，smtype =“lm”，smformula =“y~x1 + x2 + x1sq”，
                numit = 50，method = c（“”，“norm”， “”，“”））
impobj < -  imputationList（imps2 $ impDatasets）

输出：

[1] "Outcome variable(s): y"
[1] "Passive variables: x1sq"
[1] "Partially obs. variables: x1,x2"
[1] "Fully obs. substantive model variables: "
[1] "Imputation  1"
[1] "Imputing:  x1  using  x2  plus outcome"
[1] "Imputing:  x2  using  x1,x1sq  plus outcome"
[1] "Imputation  2"
[1] "Imputation  3"
[1] "Imputation  4"
[1] "Imputation  5"
Warning message:
In smcfcs.core(originaldata, smtype, smformula, method, predictorMatrix,  :
  Rejection sampling failed 17260 times (across all variables, iterations, and imputations). You may want to increase the rejection sampling limit.

Multiple imputation results:
      with(impobj, lm(y ~ x1 + x2))
      MIcombine.default(models)
              results         se    (lower    upper) missInfo
(Intercept) 0.2687343 0.04002737 0.1694782 0.3679903     88 %
x1          1.0276229 0.03432337 0.9436348 1.1116109     86 %
x2          1.0742299 0.01635284 1.0385746 1.1098852     64 %

我们现在估计与数据生成机制中使用的真实值非常接近。

需要注意的一点是，我们已经修改了假设为x2 | x1的模型，但我们还将实体模型（至少是用作插补过程的一部分的模型）修改为包含x1sq的模型。

predictorMatrix < -  array（0，dim = c（4,4））
predictorMatrix [2，c（1,4）] < -  1 
imps3 < -   （mydata，smtype =“lm”，smformula =“y~x1 + x2“，numit = 50，
                predictorMatrix = predictorMatrix ）
impobj < -  imputationList（imps3 $ impDatasets）
models < -  with（impobj，lm（y~x1） + x2））

输出：

[1] "Outcome variable(s): y"
[1] "Passive variables: "
[1] "Partially obs. variables: x2"
[1] "Fully obs. substantive model variables: x1"
[1] "Imputation  1"
[1] "Imputing:  x2  using  x1,x1sq  plus outcome"
[1] "Imputation  2"
[1] "Imputation  3"
[1] "Imputation  4"
[1] "Imputation  5"
Warning message:
In smcfcs.core(originaldata, smtype, smformula, method, predictorMatrix,  :
  Rejection sampling failed 503 times (across all variables, iterations, and imputations). You may want to increase the rejection sampling limit.

Multiple imputation results:
      with(impobj, lm(y ~ x1 + x2))
      MIcombine.default(models)
               results          se      (lower      upper) missInfo
(Intercept) -0.0274234 0.015746687 -0.06054163 0.005694823     53 %
x1           1.0075646 0.018740270  0.96407720 1.051052088     77 %
x2           1.0026004 0.008043873  0.98549090 1.019709850     56 %

这里完全观察到x1。如果x1也有一些缺失值怎么办？然后我们需要告诉smcfcs如何估算x1，然后被动地估算x1sq变量。鉴于我们对真实数据生成模型的了解，我们应该如何归咎于x1？然而，我们将继续，要求smcfcs使用规范方法来估算x1：

mydata$x1[runif(n)<0.25] <- NA
mydata$x1sq <- mydata$x1^2
predictorMatrix[1,2] <- 1
imps4 <-  (mydata, smtype="lm", smformula = "y~x1+x2", numit=50,
                predictorMatrix=predictorMatrix,  =c("norm","norm","","x1^2"))
impobj <-  (imps4$impDatasets)
models <- with(impobj, lm(y~x1+x2))
summary(MIcombine(models))

输出：

[1] "Outcome variable(s): y"
[1] "Passive variables: x1sq"
[1] "Partially obs. variables: x1,x2"
[1] "Fully obs. substantive model variables: "
[1] "Imputation  1"
[1] "Imputing:  x1  using  x2  plus outcome"
[1] "Imputing:  x2  using  x1,x1sq  plus outcome"
[1] "Imputation  2"
[1] "Imputation  3"
[1] "Imputation  4"
[1] "Imputation  5"
Warning message:
In smcfcs.core(originaldata, smtype, smformula, method, predictorMatrix,  :
  Rejection sampling failed 17260 times (across all variables, iterations, and imputations). You may want to increase the rejection sampling limit.

Multiple imputation results:
      with(impobj, lm(y ~ x1 + x2))
      MIcombine.default(models)
              results         se    (lower    upper) missInfo
(Intercept) 0.2687343 0.04002737 0.1694782 0.3679903     88 %
x1          1.0276229 0.03432337 0.9436348 1.1116109     86 %
x2          1.0742299 0.01635284 1.0385746 1.1098852     64 %

这个例子也说明了smcfcs的一个理论问题 - 虽然它从一个与指定的实体或结果模型兼容的插补模型中推算每个协变量，但这并不意味着这些插补模型中的每一个都是相互兼容的。具体而言，用于分配其他协变量的模型可能不兼容。

更有效的方法是为数据指定单个联合模型，并在其隐含的条件分布下进行估算。例如，这可以使用JAGS来实现。

如果您有任何疑问，请在下面发表评论。

查看全文

相关阅读:
【Atcoder】CODE FESTIVAL 2017 qual C D
【BZOJ】4756: [Usaco2017 Jan]Promotion Counting
【Luogu】P3933 Chtholly Nota Seniorious
【BZOJ】1914: [Usaco2010 OPen]Triangle Counting 数三角形
 【算法】计算几何
 【BZOJ】1774: [Usaco2009 Dec]Toll 过路费
 【BZOJ】2200: [Usaco2011 Jan]道路和航线
 【BZOJ】1833 [ZJOI2010]count 数字计数
 【BZOJ】1731: [Usaco2005 dec]Layout 排队布局
 【BZOJ】1577: [Usaco2009 Feb]庙会捷运Fair Shuttle

原文地址：https://www.cnblogs.com/tecdat/p/11468369.html