zoukankan      html  css  js  c++  java
  • R语言文摘:Subsetting Data

    原文地址:https://www.statmethods.net/management/subset.html

    R has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations. The following code snippets demonstrate ways to keep or delete variables and observations and to take random samples from a dataset.

    Selecting (Keeping) Variables

    # select variables v1, v2, v3
    myvars <- c("v1", "v2", "v3")
    newdata <- mydata[myvars]

    # another method
    myvars <- paste("v", 1:3, sep="")
    newdata <- mydata[myvars]

    # select 1st and 5th thru 10th variables
    newdata <- mydata[c(1,5:10)]

    To practice this interactively, try the selection of data frame elements exercises in the Data frames chapter of this introduction to R course.

    Excluding (DROPPING) Variables

    # exclude variables v1, v2, v3
    myvars <- names(mydata) %in% c("v1", "v2", "v3") 
    newdata <- mydata[!myvars]

    # exclude 3rd and 5th variable 
    newdata <- mydata[c(-3,-5)]

    # delete variables v3 and v5
    mydata$v3 <- mydata$v5 <- NULL

    Selecting Observations

    # first 5 observations
    newdata <- mydata[1:5,]

    # based on variable values
    newdata <- mydata[ which(mydata$gender=='F' 
    & mydata$age > 65), ]

    # or
    attach(mydata)
    newdata <- mydata[ which(gender=='F' & age > 65),]
    detach(mydata)

    Selection using the Subset Function

    The subset( ) function is the easiest way to select variables and observations. In the following example, we select all rows that have a value of age greater than or equal to 20 or age less then 10. We keep the ID and Weight columns.

    # using subset function 
    newdata <- subset(mydata, age >= 20 | age < 10, 
    select=c(ID, Weight))

    In the next example, we select all men over the age of 25 and we keep variables weight through income (weight, income and all columns between them).

    # using subset function (part 2)
    newdata <- subset(mydata, sex=="m" & age > 25,
    select=weight:income)

    To practice the subset() function, try this this interactive exercise. on subsetting data.tables.

    Random Samples

    Use the sample( ) function to take a random sample of size n from a dataset.

    # take a random sample of size 50 from a dataset mydata 
    # sample without replacement
    mysample <- mydata[sample(1:nrow(mydata), 50,
       replace=FALSE),]

  • 相关阅读:
    linux C总结篇(进程)
    进程与线程的区分
    递归的两种思路
    Linux下git与github的一般使用
    文件读写和文件指针的移动
    文件的创建,打开与关闭
    一个简单脚本
    linux 三剑客命令(grep,sed ,awk)
    常用正则表达式
    PAT:1002. A+B for Polynomials (25) 部分错误
  • 原文地址:https://www.cnblogs.com/chickenwrap/p/10166562.html
Copyright © 2011-2022 走看看