zoukankan      html  css  js  c++  java
  • R中的一些数据形式

    当我们输入的数据形式有字符串和数字的时候,更好的输入形式就是以数据框的形式输入进去,数据框也可以用ncol() ,nrow(),取具体某个值 这些函数等

    但是以数据框形式输入,有字符串时,这些字符串默认是以因子的形式的

    例如:

    Died.At <- c(22,40,72,41)
    Writer.At <- c(16, 18, 36, 36)
    First.Name <- c("John", "Edgar", "Walt", "Jane")
    Second.Name <- c("Doe", "Poe", "Whitman", "Austen")
    Sex <- c("MALE", "MALE", "MALE", "FEMALE")
    Date.Of.Death <- c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18")
    

    Next, you just combine the vectors that you made with the data.frame() function:

     writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)

    Remember that data frames must have variables of the same length. Check if you have put an equal number of arguments in all c()functions that you assign to the vectors and that you have indicated strings of words with "".

    Note that when you use the data.frame() function, character variables are imported as factors or categorical variables. Use the str() function to get to know more about your data frame.

     str(writers_df)

     For the variables First.Name and Second.Name, you don't want this. You can use the I() function to insulate them. 

    You can keep the Sex vector as a factor, because there are only a limited amount of possible values that this variable can have.

    Also for the variable Date.of.Death you don't want to have a factor. It would be better if the values are registered as dates. You can add the as.Date() function to this variable to make sure this happens.

    str(writers_df)

    ## 'data.frame':    4 obs. of  6 variables:
    ##  $ Died.At      : num  22 40 72 41
    ##  $ Writer.At    : num  16 18 36 36
    ##  $ First.Name   : Factor w/ 4 levels "Edgar","Jane",..: 3 1 4 2
    ##  $ Second.Name  : Factor w/ 4 levels "Austen","Doe",..: 2 3 4 1
    ##  $ Sex          : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
    ##  $ Date.Of.Death: Factor w/ 4 levels "1817-07-18","1849-10-07",..: 4 2 3 1


    Died.At <- c(22,40,72,41)
    Writer.At <- c(16, 18, 36, 36)
    First.Name <- I(c("John", "Edgar", "Walt", "Jane"))
    Second.Name <- I(c("Doe", "Poe", "Whitman", "Austen"))
    Sex <- c("MALE", "MALE", "MALE", "FEMALE")
    Date.Of.Death <- as.Date(c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18"))
    writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)
    str(writers_df)

    str(writers_df)

    ## 'data.frame':    4 obs. of  6 variables:
    ##  $ Died.At      : num  22 40 72 41
    ##  $ Writer.At    : num  16 18 36 36
    ##  $ First.Name   :Class 'AsIs'  chr [1:4] "John" "Edgar" "Walt" "Jane"
    ##  $ Second.Name  :Class 'AsIs'  chr [1:4] "Doe" "Poe" "Whitman" "Austen"
    ##  $ Sex          : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
    ##  $ Date.Of.Death: Date, format: "2015-05-10" "1849-10-07" ...

    I()函数能够隔绝字符串,把它转换成一般变量,而不是因子


    You can also retrieve the names with the names() function:

    names(writers_df)
    ## [1] "Died.At"       "Writer.At"     "First.Name"    "Second.Name"
    ## [5] "Sex"           "Date.Of.Death"

    How To Remove Columns And Rows From A Data Frame

    writers_df[1,3] <- NULL

     

    rows_to_keep <- c(TRUE, FALSE, TRUE, FALSE)
    > limited_writers_df <- writers_df[rows_to_keep,]
    > limited_writers_df
    Died.At Writer.At First.Name Second.Name Sex Date.Of.Death
    1 22 16 John Doe MALE 2015-05-10
    3 72 36 Walt Whitman MALE 1892-03-26

     

    >X<-data.frame()

    temps = data.frame(day=1:10,
    + min = c(50.7,52.8,48.6,53.0,49.9,47.9,54.1,47.6,43.6,45.5),
    + max = c(59.5,55.7,57.3,71.5,69.8,68.8,67.5,66.0,66.1,61.7))
    > head(temps)
    day min max
    1 1 50.7 59.5
    2 2 52.8 55.7
    3 3 48.6 57.3
    4 4 53.0 71.5
    5 5 49.9 69.8
    6 6 47.9 68.8

    > sapply(temps,mode)
    date min maximum
    "numeric" "numeric" "numeric"

    访问数据框的变量

    > temps[,3]
    [1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7

    When you use a single subscript with a data frame, it refers to a data frame consisting
    of just that column. R also provides a special subscripting method (double brackets)
    to extract the actual data (in this case a vector) from the data frame:

    > temps['max']
    max
    1 59.5
    2 55.7
    3 57.3
    4 71.5
    5 69.8
    6 68.8
    7 67.5
    8 66.0
    9 66.1
    10 61.7
    > temps[['max']]
    [1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7

    n. Suppose we want to convert our minimum
    and maximum temperatures to centigrade, and then calculate the di erence between
    them. Using with, we can write:
    > with(temps,5/9*(max-32) - 5/9*(min-32))
    [1] 4.888889 1.611111 4.833333 10.277778 11.055556 11.611111 7.444444
    [8] 10.222222 12.500000 9.000000

  • 相关阅读:
    远程桌面无法复制粘贴
    正则表达式(http://tieba.baidu.com/p/882391125)
    android 2048游戏、kotlin应用、跑马灯、动画源码
    Android扫码二维码、美女瀑布流、知乎网易音乐、动画源码等
    android狼人杀源码,桌面源码,猎豹快切源码
    android文件管理器源码、斗鱼直播源码、企业级erp源码等
    android动画源码合集、动态主题框架、社交app源码等
    android下载管理、理财、浏览器、商品筛选、录音源码等
    android手机安全卫士、Kotlin漫画、支付宝动画、沉浸状态栏等源码
    android企业级商城源码、360°全景图VR源码、全民直播源码等
  • 原文地址:https://www.cnblogs.com/yupeter007/p/5329250.html
Copyright © 2011-2022 走看看