zoukankan      html  css  js  c++  java
  • R语言与概率统计(六) 主成分分析 因子分析

    超高维度分析,N*P的矩阵,N为样本个数,P为指标,N<<P

    PCA:抓住对y对重要的影响因素

    主要有三种:PCA,因子分析,回归方程+惩罚函数(如LASSO)

    为了降维,用更少的变量解决问题,如果是二维的,那么就是找到一条线,要使这些点再线上的投影最大,投影最大,就是越分散,就考虑方差最大。

     

     

    > conomy<-data.frame(
    +   x1=c(149.3, 161.2, 171.5, 175.5, 180.8, 190.7, 
    +        202.1, 212.4, 226.1, 231.9, 239.0),
    +   x2=c(4.2, 4.1, 3.1, 3.1, 1.1, 2.2, 2.1, 5.6, 5.0, 5.1, 0.7),
    +   x3=c(108.1, 114.8, 123.2, 126.9, 132.1, 137.7, 
    +        146.0, 154.1, 162.3, 164.3, 167.6),
    +   y=c(15.9, 16.4, 19.0, 19.1, 18.8, 20.4, 22.7, 
    +       26.5, 28.1, 27.6, 26.3)
    + )
    > #### 作线性回归
    > lm.sol<-lm(y~x1+x2+x3, data=conomy)
    > summary(lm.sol)
    
    Call:
    lm(formula = y ~ x1 + x2 + x3, data = conomy)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.52367 -0.38953  0.05424  0.22644  0.78313 
    
    Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
    (Intercept) -10.12799    1.21216  -8.355  6.9e-05 ***
    x1           -0.05140    0.07028  -0.731 0.488344    
    x2            0.58695    0.09462   6.203 0.000444 ***
    x3            0.28685    0.10221   2.807 0.026277 *  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.4889 on 7 degrees of freedom
    Multiple R-squared:  0.9919,	Adjusted R-squared:  0.9884 
    F-statistic: 285.6 on 3 and 7 DF,  p-value: 1.112e-07
    
    > #### 作主成分分析
    > conomy.pr<-princomp(~x1+x2+x3, data=conomy, cor=T)
    > summary(conomy.pr, loadings=TRUE)
    Importance of components:
                             Comp.1    Comp.2       Comp.3
    Standard deviation     1.413915 0.9990767 0.0518737839
    Proportion of Variance 0.666385 0.3327181 0.0008969632
    Cumulative Proportion  0.666385 0.9991030 1.0000000000
    
    Loadings:
       Comp.1 Comp.2 Comp.3
    x1  0.706         0.707
    x2        -0.999       
    x3  0.707        -0.707
    > #### 预测测样本主成分, 并作主成分分析
    > pre<-predict(conomy.pr)
    > conomy$z1<-pre[,1]
    > conomy$z2<-pre[,2]
    > lm.sol<-lm(y~z1+z2, data=conomy)
    > summary(lm.sol)
    
    Call:
    lm(formula = y ~ z1 + z2, data = conomy)
    
    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.89838 -0.26050  0.08435  0.35677  0.66863 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  21.8909     0.1658 132.006 1.21e-14 ***
    z1            2.9892     0.1173  25.486 6.02e-09 ***
    z2           -0.8288     0.1660  -4.993  0.00106 ** 
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.55 on 8 degrees of freedom
    Multiple R-squared:  0.9883,	Adjusted R-squared:  0.9853 
    F-statistic: 337.2 on 2 and 8 DF,  p-value: 1.888e-08
    
    > #### 作变换, 得到原坐标下的关系表达式
    > beta<-coef(lm.sol); A<-loadings(conomy.pr)
    > x.bar<-conomy.pr$center; x.sd<-conomy.pr$scale
    > coef<-(beta[2]*A[,1]+ beta[3]*A[,2])/x.sd
    > beta0 <- beta[1]- sum(x.bar * coef)
    > c(beta0, coef)
    (Intercept)          x1          x2          x3 
    -9.13010782  0.07277981  0.60922012  0.10625939 
    

     

  • 相关阅读:
    数组中[::-1]或[::-n]的区别,如三维数组[:,::-1,:]
    类中__iter__与__next__的说明
    LoadRunner 事务响应时间的组成
    LoadRunner 中调用c函数生成随机字符串
    LoadRunner系列之—-02 基于webservice协议的接口测试(脚本实例)
    java 生成压测数据
    java实现从报文中获取投保单号
    接口测试怎么做
    LoadRunner中存储表格参数------关联数组
    视频录制软件&远程支持软件
  • 原文地址:https://www.cnblogs.com/caiyishuai/p/13270721.html
Copyright © 2011-2022 走看看