zoukankan      html  css  js  c++  java
  • R中的一些数据形式

    当我们输入的数据形式有字符串和数字的时候,更好的输入形式就是以数据框的形式输入进去,数据框也可以用ncol() ,nrow(),取具体某个值 这些函数等

    但是以数据框形式输入,有字符串时,这些字符串默认是以因子的形式的

    例如:

    Died.At <- c(22,40,72,41)
    Writer.At <- c(16, 18, 36, 36)
    First.Name <- c("John", "Edgar", "Walt", "Jane")
    Second.Name <- c("Doe", "Poe", "Whitman", "Austen")
    Sex <- c("MALE", "MALE", "MALE", "FEMALE")
    Date.Of.Death <- c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18")
    

    Next, you just combine the vectors that you made with the data.frame() function:

     writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)

    Remember that data frames must have variables of the same length. Check if you have put an equal number of arguments in all c()functions that you assign to the vectors and that you have indicated strings of words with "".

    Note that when you use the data.frame() function, character variables are imported as factors or categorical variables. Use the str() function to get to know more about your data frame.

     str(writers_df)

     For the variables First.Name and Second.Name, you don't want this. You can use the I() function to insulate them. 

    You can keep the Sex vector as a factor, because there are only a limited amount of possible values that this variable can have.

    Also for the variable Date.of.Death you don't want to have a factor. It would be better if the values are registered as dates. You can add the as.Date() function to this variable to make sure this happens.

    str(writers_df)

    ## 'data.frame':    4 obs. of  6 variables:
    ##  $ Died.At      : num  22 40 72 41
    ##  $ Writer.At    : num  16 18 36 36
    ##  $ First.Name   : Factor w/ 4 levels "Edgar","Jane",..: 3 1 4 2
    ##  $ Second.Name  : Factor w/ 4 levels "Austen","Doe",..: 2 3 4 1
    ##  $ Sex          : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
    ##  $ Date.Of.Death: Factor w/ 4 levels "1817-07-18","1849-10-07",..: 4 2 3 1


    Died.At <- c(22,40,72,41)
    Writer.At <- c(16, 18, 36, 36)
    First.Name <- I(c("John", "Edgar", "Walt", "Jane"))
    Second.Name <- I(c("Doe", "Poe", "Whitman", "Austen"))
    Sex <- c("MALE", "MALE", "MALE", "FEMALE")
    Date.Of.Death <- as.Date(c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18"))
    writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)
    str(writers_df)

    str(writers_df)

    ## 'data.frame':    4 obs. of  6 variables:
    ##  $ Died.At      : num  22 40 72 41
    ##  $ Writer.At    : num  16 18 36 36
    ##  $ First.Name   :Class 'AsIs'  chr [1:4] "John" "Edgar" "Walt" "Jane"
    ##  $ Second.Name  :Class 'AsIs'  chr [1:4] "Doe" "Poe" "Whitman" "Austen"
    ##  $ Sex          : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
    ##  $ Date.Of.Death: Date, format: "2015-05-10" "1849-10-07" ...

    I()函数能够隔绝字符串,把它转换成一般变量,而不是因子


    You can also retrieve the names with the names() function:

    names(writers_df)
    ## [1] "Died.At"       "Writer.At"     "First.Name"    "Second.Name"
    ## [5] "Sex"           "Date.Of.Death"

    How To Remove Columns And Rows From A Data Frame

    writers_df[1,3] <- NULL

     

    rows_to_keep <- c(TRUE, FALSE, TRUE, FALSE)
    > limited_writers_df <- writers_df[rows_to_keep,]
    > limited_writers_df
    Died.At Writer.At First.Name Second.Name Sex Date.Of.Death
    1 22 16 John Doe MALE 2015-05-10
    3 72 36 Walt Whitman MALE 1892-03-26

     

    >X<-data.frame()

    temps = data.frame(day=1:10,
    + min = c(50.7,52.8,48.6,53.0,49.9,47.9,54.1,47.6,43.6,45.5),
    + max = c(59.5,55.7,57.3,71.5,69.8,68.8,67.5,66.0,66.1,61.7))
    > head(temps)
    day min max
    1 1 50.7 59.5
    2 2 52.8 55.7
    3 3 48.6 57.3
    4 4 53.0 71.5
    5 5 49.9 69.8
    6 6 47.9 68.8

    > sapply(temps,mode)
    date min maximum
    "numeric" "numeric" "numeric"

    访问数据框的变量

    > temps[,3]
    [1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7

    When you use a single subscript with a data frame, it refers to a data frame consisting
    of just that column. R also provides a special subscripting method (double brackets)
    to extract the actual data (in this case a vector) from the data frame:

    > temps['max']
    max
    1 59.5
    2 55.7
    3 57.3
    4 71.5
    5 69.8
    6 68.8
    7 67.5
    8 66.0
    9 66.1
    10 61.7
    > temps[['max']]
    [1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7

    n. Suppose we want to convert our minimum
    and maximum temperatures to centigrade, and then calculate the di erence between
    them. Using with, we can write:
    > with(temps,5/9*(max-32) - 5/9*(min-32))
    [1] 4.888889 1.611111 4.833333 10.277778 11.055556 11.611111 7.444444
    [8] 10.222222 12.500000 9.000000

  • 相关阅读:
    Grid如何固定列宽?
    ORACLE 去除重复记录
    Ajax学习之“一头雾水”
    对对碰方块交换及消去效果实现
    存储过程学习(二)
    asp.net 页面重用问题
    一个图表控件
    存储过程学习(一)
    ScriptManager.RegisterClientScriptBlock的疑问
    用indy做发贴机
  • 原文地址:https://www.cnblogs.com/yupeter007/p/5329250.html
Copyright © 2011-2022 走看看