zoukankan      html  css  js  c++  java
  • R中的一些数据形式

    当我们输入的数据形式有字符串和数字的时候,更好的输入形式就是以数据框的形式输入进去,数据框也可以用ncol() ,nrow(),取具体某个值 这些函数等

    但是以数据框形式输入,有字符串时,这些字符串默认是以因子的形式的

    例如:

    Died.At <- c(22,40,72,41)
    Writer.At <- c(16, 18, 36, 36)
    First.Name <- c("John", "Edgar", "Walt", "Jane")
    Second.Name <- c("Doe", "Poe", "Whitman", "Austen")
    Sex <- c("MALE", "MALE", "MALE", "FEMALE")
    Date.Of.Death <- c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18")
    

    Next, you just combine the vectors that you made with the data.frame() function:

     writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)

    Remember that data frames must have variables of the same length. Check if you have put an equal number of arguments in all c()functions that you assign to the vectors and that you have indicated strings of words with "".

    Note that when you use the data.frame() function, character variables are imported as factors or categorical variables. Use the str() function to get to know more about your data frame.

     str(writers_df)

     For the variables First.Name and Second.Name, you don't want this. You can use the I() function to insulate them. 

    You can keep the Sex vector as a factor, because there are only a limited amount of possible values that this variable can have.

    Also for the variable Date.of.Death you don't want to have a factor. It would be better if the values are registered as dates. You can add the as.Date() function to this variable to make sure this happens.

    str(writers_df)

    ## 'data.frame':    4 obs. of  6 variables:
    ##  $ Died.At      : num  22 40 72 41
    ##  $ Writer.At    : num  16 18 36 36
    ##  $ First.Name   : Factor w/ 4 levels "Edgar","Jane",..: 3 1 4 2
    ##  $ Second.Name  : Factor w/ 4 levels "Austen","Doe",..: 2 3 4 1
    ##  $ Sex          : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
    ##  $ Date.Of.Death: Factor w/ 4 levels "1817-07-18","1849-10-07",..: 4 2 3 1


    Died.At <- c(22,40,72,41)
    Writer.At <- c(16, 18, 36, 36)
    First.Name <- I(c("John", "Edgar", "Walt", "Jane"))
    Second.Name <- I(c("Doe", "Poe", "Whitman", "Austen"))
    Sex <- c("MALE", "MALE", "MALE", "FEMALE")
    Date.Of.Death <- as.Date(c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18"))
    writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex, Date.Of.Death)
    str(writers_df)

    str(writers_df)

    ## 'data.frame':    4 obs. of  6 variables:
    ##  $ Died.At      : num  22 40 72 41
    ##  $ Writer.At    : num  16 18 36 36
    ##  $ First.Name   :Class 'AsIs'  chr [1:4] "John" "Edgar" "Walt" "Jane"
    ##  $ Second.Name  :Class 'AsIs'  chr [1:4] "Doe" "Poe" "Whitman" "Austen"
    ##  $ Sex          : Factor w/ 2 levels "FEMALE","MALE": 2 2 2 1
    ##  $ Date.Of.Death: Date, format: "2015-05-10" "1849-10-07" ...

    I()函数能够隔绝字符串,把它转换成一般变量,而不是因子


    You can also retrieve the names with the names() function:

    names(writers_df)
    ## [1] "Died.At"       "Writer.At"     "First.Name"    "Second.Name"
    ## [5] "Sex"           "Date.Of.Death"

    How To Remove Columns And Rows From A Data Frame

    writers_df[1,3] <- NULL

     

    rows_to_keep <- c(TRUE, FALSE, TRUE, FALSE)
    > limited_writers_df <- writers_df[rows_to_keep,]
    > limited_writers_df
    Died.At Writer.At First.Name Second.Name Sex Date.Of.Death
    1 22 16 John Doe MALE 2015-05-10
    3 72 36 Walt Whitman MALE 1892-03-26

     

    >X<-data.frame()

    temps = data.frame(day=1:10,
    + min = c(50.7,52.8,48.6,53.0,49.9,47.9,54.1,47.6,43.6,45.5),
    + max = c(59.5,55.7,57.3,71.5,69.8,68.8,67.5,66.0,66.1,61.7))
    > head(temps)
    day min max
    1 1 50.7 59.5
    2 2 52.8 55.7
    3 3 48.6 57.3
    4 4 53.0 71.5
    5 5 49.9 69.8
    6 6 47.9 68.8

    > sapply(temps,mode)
    date min maximum
    "numeric" "numeric" "numeric"

    访问数据框的变量

    > temps[,3]
    [1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7

    When you use a single subscript with a data frame, it refers to a data frame consisting
    of just that column. R also provides a special subscripting method (double brackets)
    to extract the actual data (in this case a vector) from the data frame:

    > temps['max']
    max
    1 59.5
    2 55.7
    3 57.3
    4 71.5
    5 69.8
    6 68.8
    7 67.5
    8 66.0
    9 66.1
    10 61.7
    > temps[['max']]
    [1] 59.5 55.7 57.3 71.5 69.8 68.8 67.5 66.0 66.1 61.7

    n. Suppose we want to convert our minimum
    and maximum temperatures to centigrade, and then calculate the di erence between
    them. Using with, we can write:
    > with(temps,5/9*(max-32) - 5/9*(min-32))
    [1] 4.888889 1.611111 4.833333 10.277778 11.055556 11.611111 7.444444
    [8] 10.222222 12.500000 9.000000

  • 相关阅读:
    DB2 db2move导入导出数据及使用dblook导出表结构DDL
    【转】DB2 BLOB大字段数据通过命令行进行导入导出
    【转】【DataGuard】Oracle 11g物理Data Guard之Snapshot Standby数据库功能
    【转】Oracle 11g R2手动配置EM
    【转】Oracle Database Server 'TNS Listener'远程数据投毒漏洞(CVE-2012-1675)
    【转】ORACLE TNS Listener远程注册投毒(Poison Attack)漏洞
    【转】Oracle 11.2.0.4/12C新特性Valid Node Checking For Registration (VNCR)
    【转】使用 xtrabackup 进行MySQL数据库物理备份
    【转】MySQL-物理备份-Percona XtraBackup 备份原理
    【转】NBU expired Media,Media ID not found in EMM database
  • 原文地址:https://www.cnblogs.com/yupeter007/p/5329250.html
Copyright © 2011-2022 走看看