zoukankan      html  css  js  c++  java
  • 文章翻译第七章7-9

    7.Selecting features using the caret package


     The feature selection method searches the subset of features with minimized predictive errors. We can apply feature selection to identify which attributes are required to build an accurate model. The caret package provides a recursive feature elimi nationfunction,rfe,which can help automatically select the required features. In the following recipe, we will demonstrate how to use the car.


    How to do it...怎麽做

    Perform the following steps to select features:执行下列步骤来选择特征;

    1. Transform the feature named as变换的特征作为训练数据集  international_plan of the training dataset, trainset, to intl_yes and intl_no:

    > intl_plan = model.matrix(~ trainset.international_plan - 1, data=data.frame(trainset$international_plan))
    > colnames(intl_plan) = c("trainset.international_planno"="intl_no", "trainset.international_planyes"= "intl_yes")


    2. Transform the feature named as voice_mail_plan of the training dataset, trainset, to voice_yes and voice_no:

    > voice_plan = model.matrix(~ trainset.voice_mail_plan - 1,
    > colnames(voice_plan) = c("trainset.voice_mail_planno" ="voice_no","trainset.voice_mail_plan))


    3. Remove the international_plan and voice_mail_plan attributes and combine the training dataset, trainset with the data frames, intl_planand voice_plan:

    > trainset$international_plan = NULL
    > trainset$voice_mail_plan = NULL
    > trainset = cbind(intl_plan,voice_plan, trainset)


    4. Transform the feature named as international_plan of the testing dataset,

    testset, to intl_yes and intl_no:

    > intl_plan = model.matrix(~ testset.international_plan - 1, data=data.frame(testset$international_plan))
    > colnames(intl_plan) = c("testset.international_planno"="intl_no", "testset.international_planyes"= "intl_yes")


    5. Transform the feature named as voice_mail_plan of the training dataset, trainset, to voice_yes and voice_no:

    > voice_plan = model.matrix(~ testset.voice_mail_plan - 1, data=data.frame(testset$voice_mail_plan))
    > colnames(voice_plan) = c("testset.voice_mail_planno" ="voice_no", "testset.voice_mail_planyes"="voidce_yes")


    6. Remove the international_plan and voice_mail_plan attributes and combine the testing dataset, testset with the data frames, intl_plan and voice_plan:

    > testset$international_plan = NULL
    > testset$voice_mail_plan = NULL
    > testset = cbind(intl_plan,voice_plan, testset)


    7. We then create a feature selection algorithm using linear discriminant analysis:

    > ldaControl = rfeControl(functions = ldaFuncs, method = "cv")


     In this recipe, we perform feature selection using the caret package. As there are factor-coded attributes within the dataset, we first use a function called model.matrix to transform the factor-coded attributes into multiple binary attributes. Therefore, we transform the international_plan attribute to intl_yes and intl_no. Additionally, we transform the voice_mail_plan attribute to voice


    8.Measuring the performance of the regression model


      To measure the performance of a regression model, we can calculate the distance from predicted output and the actual output as a quantifier of the performance of the model. Here, we often use the root mean square error (RMSE), relative square error (RSE) and R-Square as common measurements. In the following recipe, we will illustratehowto compute these measurements from a built regressio.


      The measurement of the performance of the regression model employs the distance between the predicted value and the actual value. We often use these three measurements, root mean square error, relative square error, and R-Square, as the quantifier of the performance of regression models. In this recipe, we first load the Quartet data from the car package. We then use the lm function to fit.

    How to do it...怎麽做...

    Perform the following steps to measure the performance of the regression model:执行下列步骤来测量回归模型的性能

    1. Load the Quartet dataset from the car package:

    > library(car)
    > data(Quartet)2. Plot the attribute, y3, against x using the lm function:> plot(Quartet$x, Quartet$y3)> lmfit = lm(Quartet$y3~Quartet$x)
    > abline(lmfit, col="red")


    Figure 4: The linear regression plot3. You can retrieve predicted values by using the predict function:

    > predicted= predict(lmfit, newdata=Quartet[c("x")])


    4. Now, you can calculate the root mean square error:

    > actual = Quartet$y3> rmse = (mean((predicted - actual)^2))^0.5> rmse[1] 1.118286


    5. You can calculate the relative square error:

    > mu = mean(actual)
    > rse = mean((predicted - actual)^2) / mean((mu - actual)^2) > rse[1] 0.3336766.


    Also, you can use R-Square as a measurement:

    > rsquare = 1 - rse> rsquare[1] 0.666324


    7. Then, you can plot attribute, y3, against x using the rlm function from the MASS package:

    > library(MASS)
    > plot(Quartet$x, Quartet$y3)
    > rlmfit = rlm(Quartet$y3~Quartet$x)
    > abline(rlmfit, col="red")


    Figure 5: The robust linear regression plot on the Quartet dataset


    9.Measuring prediction performance with a confusion matrix用混淆矩阵测量预测性能

    To measure the performance of a classification model, we can first generate a classification table based on our predicted label and actual label. Then, we can use a confusion matrix to obtain performance measures such as precision, recall, specificity, and accuracy. In this recipe, we will demonstrate how to retrieve a confusion mat


    How to do it怎么做...Perform the following steps to generate a classification measurement:执行下列步骤以生成分类度量

    1. Train an svm model using the training dataset:训练SVM模型的训练数据集

    > svm.model= train(churn ~ .,+ data = trainset,
    + method = "svmRadial")


    2. You can then predict labels using the fitted model,然后,你可以预测标签使用拟合模型


    > svm.pred = predict(svm.model, testset[,! names(testset) %in% c("churn")])


    3. Next, you can generate a classification table:接下来,可以生成分类表

    > table(svm.pred, testset[,c("churn")])svm.pred yes noyes 73 16 no 68 861


    4.Lastly, you can generate a confusion matrix using the prediction results and the actual labels from the testing dataset:最后,你可以使用预测结果与实际测试数据集的标签生成混淆矩阵

    > confusionMatrix(svm.pred, testset[,c("churn")])
    Confusion Matrix and Statistics
     ReferencePrediction yes no
     yes 73 16
     no 68 861


      Accuracy : 0.9175  95% CI : (0.8989, 0.9337) No Information Rate : 0.8615  P-Value [Acc > NIR] : 2.273e-08  Kappa : 0.5909  Mcnemar's Test P-Value : 2.628e-08

     Sensitivity : 0.51773

     Specificity : 0.98176  Pos Pred Value : 0.82022  Neg Pred Value : 0.92680

     Prevalence : 0.13851  Detection Rate : 0.07171  Detection Prevalence : 0.08743

     Balanced Accuracy : 0.74974

     'Positive' Class : yes

    In this recipe, we demonstrate how to obtain a confusion matrix to measure the performance of a classification model. First, we use the train function from the caret package to train an svm model. Next, we use the predict function to extract the predicted labels of the svm model using the testing dataset. Then, we perform the table function to obtain the classification table based on the performance.



                                                                         --------摘自百度翻译                              郎小敏



  • 相关阅读:
  • 原文地址:https://www.cnblogs.com/D2016/p/6921022.html
Copyright © 2011-2022 走看看