zoukankan      html  css  js  c++  java
  • 统计学习导论:基于R应用——第四章习题

    第四章习题,部分题目未给出答案

    1.

    这个题比较简单,有高中生推导水平的应该不难。

    2~3证明题,略

    4.

    (a)

    这个问题问我略困惑,答案怎么直接写出来了,难道不是10%么

    (b)

    这个答案是(0.1*0.1)/(1*1),所以答案是1%

    (c)

    其实就是个空间所占比例,所以这题是(0.1**100)*100 = 0.1**98%

    (d)

    这题答案显而易见啊,而且是指数级别下降

    (e)

    答案是0.1**(1)、0.1**(1/2)、0.1**(1/3)...0.1**(1/100)

    5.

    这题在中文版的104页的偏差-方差权衡说的听清楚。

    (a)

    当贝叶斯决策边界是线性的时候,训练集上当然是QDA效果好,因为拟合的更好。而测试集上是LDA更好,因为更接近实际。

    (b)

    当贝叶斯决策边界是非线性的时候,QDA在训练集和测试集都比LDA好

    (c)

    相比于LDA,QDA的预测率变得更好。因为当样本量n提升时,一个自由度更高的模型会产生更好的效果,因为方差会被大的样本抵消一点

    (d)

    不对。因为当样本很少时,QDA会过拟合。

    6.

    (a)

    由公式直接带入p(X)=37.75%

    (b)

    还是带入上述公式,反求X1为50hours

    7.

    其实就是贝叶斯公式+中文版书97页公式4-12。。。有点繁琐,最后答案是75.2%

    8.

    文字题。。当你用K=1的KNN时,在训练集上的错误率是0%,所以测试集上错误率实际是36%。我们当然选逻辑回归啦

    9.

    参见92页公式4-3。。。就是带入公式而已,第一题是27%,第二题是0.19

    10.

    (a)

    感觉题目里面让我们进行数值和图像描述统计时,大概就三条命令:summary()、pairs()、cor()。不过pairs()在特征很多的时候,跑的真心慢,cor()在使用前也要把定性的变量去掉。

    library(ISLR)
    summary(Weekly)
    pairs(Weekly)
    cor(Weekly[, -9])
    

    (b)

    attach(Weekly)
    glm.fit = glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = Weekly,  family = binomial)
    summary(glm.fit)
    

    (c)

    glm.probs = predict(glm.fit, type = "response")
    glm.pred = rep("Down", length(glm.probs))
    glm.pred[glm.probs > 0.5] = "Up"
    table(glm.pred, Direction)

    (d)

    train = (Year < 2009)
    Weekly.0910 = Weekly[!train, ]
    glm.fit = glm(Direction ~ Lag2, data = Weekly, family = binomial, subset = train)
    glm.probs = predict(glm.fit, Weekly.0910, type = "response")
    glm.pred = rep("Down", length(glm.probs))
    glm.pred[glm.probs > 0.5] = "Up"
    Direction.0910 = Direction[!train]
    table(glm.pred, Direction.0910)
    mean(glm.pred == Direction.0910)

    (e)

    library(MASS)
    lda.fit = lda(Direction ~ Lag2, data = Weekly, subset = train)
    lda.pred = predict(lda.fit, Weekly.0910)
    table(lda.pred$class, Direction.0910)
    mean(lda.pred$class == Direction.0910)
    

    (f)

    qda.fit = qda(Direction ~ Lag2, data = Weekly, subset = train)
    qda.class = predict(qda.fit, Weekly.0910)$class
    table(qda.class, Direction.0910)
    mean(qda.class == Direction.0910)
    

    (g)

    library(class)
    train.X = as.matrix(Lag2[train])
    test.X = as.matrix(Lag2[!train])
    train.Direction = Direction[train]
    set.seed(1)
    knn.pred = knn(train.X, test.X, train.Direction, k = 1)
    table(knn.pred, Direction.0910)
    mean(knn.pred == Direction.0910)
    

    (h)

    两种方法的准确率一样。。。

    (i)

    # Logistic regression with Lag2:Lag1
    glm.fit = glm(Direction ~ Lag2:Lag1, data = Weekly, family = binomial, subset = train)
    glm.probs = predict(glm.fit, Weekly.0910, type = "response")
    glm.pred = rep("Down", length(glm.probs))
    glm.pred[glm.probs > 0.5] = "Up"
    Direction.0910 = Direction[!train]
    table(glm.pred, Direction.0910)
    mean(glm.pred == Direction.0910)
    ## [1] 0.5865
    
    # LDA with Lag2 interaction with Lag1
    lda.fit = lda(Direction ~ Lag2:Lag1, data = Weekly, subset = train)
    lda.pred = predict(lda.fit, Weekly.0910)
    mean(lda.pred$class == Direction.0910)
    ## [1] 0.5769
    
    # QDA with sqrt(abs(Lag2))
    qda.fit = qda(Direction ~ Lag2 + sqrt(abs(Lag2)), data = Weekly, subset = train)
    qda.class = predict(qda.fit, Weekly.0910)$class
    table(qda.class, Direction.0910)
    mean(qda.class == Direction.0910)
    ## [1] 0.5769
    
    # KNN k =10
    knn.pred = knn(train.X, test.X, train.Direction, k = 10)
    table(knn.pred, Direction.0910)
    mean(knn.pred == Direction.0910)
    ## [1] 0.5769
    
    # KNN k = 100
    knn.pred = knn(train.X, test.X, train.Direction, k = 100)
    table(knn.pred, Direction.0910)
    mean(knn.pred == Direction.0910)
    ## [1] 0.5577
    

    结果在代码注释中,逻辑回归效果最好

    11.

    (a)

    library(ISLR)
    summary(Auto)
    
    attach(Auto)
    mpg01 = rep(0, length(mpg))
    mpg01[mpg > median(mpg)] = 1
    Auto = data.frame(Auto, mpg01)

    (b)

    cor(Auto[, -9])
    pairs(Auto)

    (c)

    train = (year%%2 == 0)  # if the year is even
    test = !train
    Auto.train = Auto[train, ]
    Auto.test = Auto[test, ]
    mpg01.test = mpg01[test]

    (d)

    library(MASS)
    lda.fit = lda(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, subset = train)
    lda.pred = predict(lda.fit, Auto.test)
    mean(lda.pred$class != mpg01.test)

    (e)

    qda.fit = qda(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, subset = train)
    qda.pred = predict(qda.fit, Auto.test)
    mean(qda.pred$class != mpg01.test)

    (f)

    glm.fit = glm(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, family = binomial, subset = train)
    glm.probs = predict(glm.fit, Auto.test, type = "response")
    glm.pred = rep(0, length(glm.probs))
    glm.pred[glm.probs > 0.5] = 1
    mean(glm.pred != mpg01.test)

    (g)

    library(class)
    train.X = cbind(cylinders, weight, displacement, horsepower)[train, ]
    test.X = cbind(cylinders, weight, displacement, horsepower)[test, ]
    train.mpg01 = mpg01[train]
    set.seed(1)
    # KNN(k=1)
    knn.pred = knn(train.X, test.X, train.mpg01, k = 1)
    mean(knn.pred != mpg01.test)
    
    # KNN(k=10)
    knn.pred = knn(train.X, test.X, train.mpg01, k = 10)
    mean(knn.pred != mpg01.test)
    
    # KNN(k=100)
    knn.pred = knn(train.X, test.X, train.mpg01, k = 100)
    mean(knn.pred != mpg01.test)
    

    13题和11题类似,就是用这几个函数。所以13题略。

    12.

    (a)~(b)

    Power = function() {
      2^3
    }
    print(Power())
    
    Power2 = function(x, a) {
      x^a
    }
    Power2(3, 8)

    (c)

    Power2(10, 3)
    Power2(8, 17)
    Power2(131, 3)

    (d)~(f)

    Power3 = function(x, a) {
      result = x^a
      return(result)
    }
    
    x = 1:10
    plot(x, Power3(x, 2), log = "xy", ylab = "Log of y = x^2", xlab = "Log of x", 
         main = "Log of x^2 versus Log of x")
    
    PlotPower = function(x, a) {
      plot(x, Power3(x, a))
    }
    PlotPower(1:10, 3)
    
  • 相关阅读:
    markdown with vim
    递归
    类 sizeof
    cppcheck工具
    c++ explicit的含义和用法
    pca主成分分析
    string的使用
    linux的shell进化简史
    adb shell 无法启动 (insufficient permissions for device)
    c++ 四种转换的意思
  • 原文地址:https://www.cnblogs.com/-Sai-/p/5464816.html
Copyright © 2011-2022 走看看