zoukankan      html  css  js  c++  java
  • R语言-岭回归及lasso算法

    前文我们讲到线性回归建模会有共线性的问题,岭回归和lasso算法都能一定程度上消除共线性问题。

    岭回归

    > #########正则化方法消除共线性
    > ###岭回归
    > ###glmnet只能处理矩阵
    > library(glmnet)
    > library(mice)
    > creditcard_exp<-creditcard_exp[complete.cases(creditcard_exp),]
    > x<-as.matrix(creditcard_exp[,c(6,7,10,11)])
    > y<-as.matrix(creditcard_exp[,3])
    > #看一下岭脊图
    > r1<-glmnet(x=x,y=y,family = "gaussian",alpha = 0)#alpha = 0表示岭回归,x,y不能有缺失值             
    > plot(r1,xvar="lambda")

    > r1cv<-cv.glmnet(x=x,y=y,family="gaussian",alpha=0,nfolds = 10)#用交叉验证得到lambda
    > plot(r1cv)

    > rimin<-glmnet(x=x,y=y,family = "gaussian",alpha = 0,lambda = r1cv$lambda.min)#取误差平方和最小时的λ
    > coef(rimin)
    5 x 1 sparse Matrix of class "dgCMatrix"
                             s0
    (Intercept)     106.5467017
    Age               0.9156047
    Income           19.6903291
    dist_home_val     1.7357213
    dist_avg_income  71.5765458

    我们可以看到这次模型的收入和支出是正相关了。

    lasso算法

    #####Lasson算法:有变量筛选功效
    r1l<-cv.glmnet(x=x,y=y,family="gaussian",alpha=1,nfolds = 10)
    plot(r1l)

    > r1l1<-glmnet(x=x,y=y,family = "gaussian",alpha = 1,lambda = r1l$lambda.min)#取λ最小值看建模情况
    > coef(r1l1)
    5 x 1 sparse Matrix of class "dgCMatrix"
                             s0
    (Intercept)      -27.169039
    Age                1.314711
    Income          -160.195837
    dist_home_val      1.538823
    dist_avg_income  255.395751

    看模型数据,我们得知并没有解决income为负相关的情况,而且并没有筛选变量,那么我们尝试取lambda.1se*0.5的值

    > r1l2<-glmnet(x=x,y=y,family = "gaussian",alpha = 1,lambda = r1l$lambda.1se*0.5)#0.5倍标准误差的λ
    > coef(r1l2)
    5 x 1 sparse Matrix of class "dgCMatrix"
                             s0
    (Intercept)     267.0510318
    Age               .        
    Income            .        
    dist_home_val     0.6249539
    dist_avg_income  83.6952253

    看结果,可知把一些变量删去了,消除共线性的问题,接下来我们看看lambda.1se的值

    1 > r1l3<-glmnet(x=x,y=y,alpha = 1,family = "gaussian",lambda = r1l$lambda.1se)
    2 > coef(r1l3)
    3 5 x 1 sparse Matrix of class "dgCMatrix"
    4                        s0
    5 (Intercept)     432.00684
    6 Age               .      
    7 Income            .      
    8 dist_home_val     .      
    9 dist_avg_income  68.90894

    这次结果只留了一个变量,由此可知当lambda越大,变量保留的越少,一般我们在误差最小和一倍标准差内选择合适的λ。

  • 相关阅读:
    关于搭建系统直播和Thinkphp的杂谈(持续更新)
    linux下phpstudy的搭建以及网站的搭建
    java大文件读写操作,java nio 之MappedByteBuffer,高效文件/内存映射
    IntelliJ IDEA 破解
    遍历表格
    Ajax简单示例
    [转shasiqq]@Param 注解在Mybatis中的使用 以及传递参数的三种方式
    一些python学习的链接
    python Scrapy安装错误解决
    SEVERE: Error configuring application listener of class org.springframework.web.context.ContextLoade
  • 原文地址:https://www.cnblogs.com/ye20190812/p/13925000.html
Copyright © 2011-2022 走看看