zoukankan      html  css  js  c++  java
  • R语言-逻辑回归

    > ###############逻辑回归
    > setwd("/Users/yaozhilin/Downloads/R_edu/data")
    > accepts<-read.csv("accepts.csv")
    > names(accepts)
     [1] "application_id" "account_number" "bad_ind"        "vehicle_year"   "vehicle_make"  
     [6] "bankruptcy_ind" "tot_derog"      "tot_tr"         "age_oldest_tr"  "tot_open_tr"   
    [11] "tot_rev_tr"     "tot_rev_debt"   "tot_rev_line"   "rev_util"       "fico_score"    
    [16] "purch_price"    "msrp"           "down_pyt"       "loan_term"      "loan_amt"      
    [21] "ltv"            "tot_income"     "veh_mileage"    "used_ind"      
    > accepts<-accepts[complete.cases(accepts),]
    > select<-sample(1:nrow(accepts),length(accepts$application_id)*0.7)
    > train<-accepts[select,]###70%用于建模
    > test<-accepts[-select,]###30%用于检测
    > attach(train)
    > ###用glm(y~x,family=binomial(link="logit"))
    > gl<-glm(bad_ind~fico_score,family=binomial(link = "logit"))
    > summary(gl)
    
    Call:
    glm(formula = bad_ind ~ fico_score, family = binomial(link = "logit"))
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -2.0794  -0.6790  -0.4937  -0.3073   2.6028  
    
    Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
    (Intercept)  9.049667   0.629120   14.38   <2e-16 ***
    fico_score  -0.015407   0.000938  -16.43   <2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 2989.2  on 3046  degrees of freedom
    Residual deviance: 2665.9  on 3045  degrees of freedom
    AIC: 2669.9
    
    Number of Fisher Scoring iterations: 5

    多元逻辑回归

    > ###多元逻辑回归
    > gls<-glm(bad_ind~fico_score+bankruptcy_ind+age_oldest_tr+
    +            tot_derog+rev_util+veh_mileage,family = binomial(link = "logit"))
    > summary(gls)
    
    Call:
    glm(formula = bad_ind ~ fico_score + bankruptcy_ind + age_oldest_tr + 
        tot_derog + rev_util + veh_mileage, family = binomial(link = "logit"))
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -2.2646  -0.6743  -0.4647  -0.2630   2.8177  
    
    Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
    (Intercept)      8.205e+00  7.433e-01  11.039  < 2e-16 ***
    fico_score      -1.338e-02  1.092e-03 -12.260  < 2e-16 ***
    bankruptcy_indY -3.771e-01  1.855e-01  -2.033   0.0421 *  
    age_oldest_tr   -4.458e-03  6.375e-04  -6.994 2.68e-12 ***
    tot_derog        3.012e-02  1.552e-02   1.941   0.0523 .  
    rev_util         3.763e-04  5.252e-04   0.717   0.4737    
    veh_mileage      2.466e-06  1.381e-06   1.786   0.0741 .  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 2989.2  on 3046  degrees of freedom
    Residual deviance: 2601.4  on 3040  degrees of freedom
    AIC: 2615.4
    
    Number of Fisher Scoring iterations: 5
    
    > glss<-step(gls,direction = "both")
    Start:  AIC=2615.35
    bad_ind ~ fico_score + bankruptcy_ind + age_oldest_tr + tot_derog + 
        rev_util + veh_mileage
    
                     Df Deviance    AIC
    - rev_util        1   2601.9 2613.9
    <none>                2601.3 2615.3
    - veh_mileage     1   2604.4 2616.4
    - tot_derog       1   2605.1 2617.1
    - bankruptcy_ind  1   2605.7 2617.7
    - age_oldest_tr   1   2655.9 2667.9
    - fico_score      1   2763.8 2775.8
    
    Step:  AIC=2613.88
    bad_ind ~ fico_score + bankruptcy_ind + age_oldest_tr + tot_derog + 
        veh_mileage
    
                     Df Deviance    AIC
    <none>                2601.9 2613.9
    - veh_mileage     1   2604.9 2614.9
    + rev_util        1   2601.3 2615.3
    - tot_derog       1   2605.7 2615.7
    - bankruptcy_ind  1   2606.1 2616.1
    - age_oldest_tr   1   2656.9 2666.9
    - fico_score      1   2773.2 2783.2
    > #出来的数据是logit,我们需要转换
    > train$pre<-predict(glss,train)
    > #出来的数据是logit,我们需要转换
    > train$pre<-predict(glss,train)
    > summary(train$pre)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     -4.868  -2.421  -1.671  -1.713  -1.011   2.497 
    > train$pre_p<-1/(1+exp(-1*train$pre))
    > summary(train$pre_p)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.00763 0.08157 0.15823 0.19298 0.26677 0.92395 
    1 > #逻辑回归不需要检测扰动项,但需要检测共线性
    2 > library(car)
    3 > vif(glss)
    4     fico_score bankruptcy_ind  age_oldest_tr      tot_derog    veh_mileage 
    5       1.271283       1.144846       1.075603       1.423850       1.003616 
  • 相关阅读:
    Building a flexiable renderer
    Indirect Illumination in mental ray
    我的心情
    Cellular Automata
    Subsurface Scattering in mental ray
    Shader Types in mental ray
    BSP Traversal
    我的渲染器终于达到了MR的速度
    How to handle displacement and motion blur
    说明
  • 原文地址:https://www.cnblogs.com/ye20190812/p/13925635.html
Copyright © 2011-2022 走看看