zoukankan      html  css  js  c++  java
  • R语言学习笔记(一)

    今天把之前看的Head First Data Analysis中的R语言练习过来一遍,感觉R语言还是挺有意思的,它支持一些非常专业的统计库,例如用它来计算方差,斜率什么的都很简洁,特别是一张用R生成的分组散点图,效果非常的赞。但是这些生成的图片并不能像Excel中的图标一样支持数据钻取,希望后续的学习能解答我现在的这些疑惑。


    加载统计文件
    R Source File include one statement =>employees <- read.csv("c:\hfda_ch09_employees.csv",header=TRUE)
    source(“R source file path”)

    帮助函数
    help(command) e.g. help(sd)

    方差函数
    sd(X)
    [1] 2.432138

    简介函数
    summary(x)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
    -1.800 4.600 5.500 6.028 6.700 25.900
    1st Qu. means 25% observations are below this quantity(approx)
    3st Qu. means 75% observations are below this quantity(approx)
    Median means median value
    mean means average value

    柱状图
    hist(employees$received[employees$negotiated==TRUE],50) --带约束条件

    散点图
    plot(employees$requested[employees$negotiated==TRUE],employees$received[employees$negotiated==TRUE])
    约束条件要一致, 例如:employees$negotiated==TRUE,这个条件必须一致。

    斜率计算
    cor(employees$requested[employees$negotiated==TRUE],employees$received[employees$negotiated==TRUE])

    截距和斜率的计算
    > mylm <- lm(received[negotiated==TRUE]~requested[negotiated==TRUE],data=employees)
    > mylm$coefficients
    (Intercept) requested[negotiated == TRUE]
    2.3121277 0.7250664

    截距和斜率的计算(多约束条件)
    定义线
    myLMBig <- lm(received[negotiated==TRUE & requested >10]~requested[negotiated==TRUE & requested >10],data=employees)
    > myLMSmall <- lm(received[negotiated==TRUE & requested <=10]~requested[negotiated==TRUE & requested<=10],data=employees)

    计算斜率和截距
    > summary(myLMBig)$coefficients
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 7.813403(截距) 1.8760371 4.164845 4.997597e-05
    requested[negotiated == TRUE & requested > 10] 0.302609(斜率) 0.1420151 2.130824 3.457618e-02


    > summary(myLMBig)$sigma
    [1] 4.544424(方差)

    > summary(myLMSmall)$coefficients
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 0.7933468 0.22472009 3.530378 4.378156e-04
    requested[negotiated == TRUE & requested <= 10] 0.9424946 0.03151835 29.903041 6.588020e-134

    > summary(myLMSmall)$sigma
    [1] 1.374526

    散点图
    dispath2 <- read.csv("dispatch analysis.csv",header=TRUE)

    plot(Sales~jitter(Article.count),data=dispath2) Jitter的作用是增加噪点,增加图的可读性


    分组噪点图
    > articleHitsComments <- read.csv("hfda_ch12_articleHitsComments.csv",header=TRUE)
    > library(lattice) 加载类库
    > head(articleHitsComments,10)
    articleID authorName webHits commentCount
    1 1 Destiny Adams 2019 14
    2 2 Jon Radermacher 1421 6
    3 3 Matt Janney 1174 8
    4 4 Matt Janney 1613 26
    5 5 Paul Semenec 1099 10
    6 6 Destiny Adams 1903 26
    7 7 Nicole Fry 1718 21
    8 8 Jason Wightman 642 8
    9 9 Jon Radermacher 1616 7
    10 10 Matt Janney 1233 12
    > xyplot(webHits~commentCount | authorName, data=webHitsComments) “|” 是分组符号,这里是按authorName进行分组

    数据清洗
    使用正则表达式
    hfhh <- read.csv("hfda_ch13_data_for_R.csv",header=TRUE)
    NewLastName <- sub("\(.*\)","",hfhh$LastName)

    排序
    hfhhSorted <- hfhh[order(hfhh$PersonID,decreasing=FALSE),]

    去重复
    hfhhNameOnly <- unique(hfhhNameOnly)

    删除不需要的列
    > hfhhNameOnly$CallID <-NULL
    > hfhhNameOnly$Time <-NULL

    输出CSV
    write.csv(hfhhNameOnly,file="output from R.csv")

    赋值
    hfhhName <- hfhhName


  • 相关阅读:
    指针与数组的区别 —— 《C语言深度剖析》读书心得
    console ouput 与 重定向输出 效率对比
    First day in 阿里
    TestNG 使用入门教程
    Spring简单使用简介
    玩转Spring JUnit+mockito+powermock单元测试(使用详解)
    Spring Boot Junit 单元测试详解
    spring @Value注入map、List、Bean、static变量方式及详细使用
    单元测试Junit使用详解
    Mockito & PowerMock详解
  • 原文地址:https://www.cnblogs.com/GhostBear/p/7592272.html
Copyright © 2011-2022 走看看