zoukankan      html  css  js  c++  java
  • R中的apply族函数和多线程计算

    一.apply族函数

    1.apply  应用于矩阵和数组

    # apply 
    # 1代表行,2代表列
    # create a matrix of 10 rows x 2 columns
    m <- matrix(c(1:10, 11:20), nrow = 10, ncol = 2)
    # mean of the rows
    apply(m, 1, mean)
    [1] 6 7 8 9 10 11 12 13 14 15
    # mean of the columns
    apply(m, 2, mean)
    [1] 5.5 15.5
    # divide all values by 2
    apply(m, 1:2, function(x) x/2)

    2.eapply 应用于环境中的变量

    # a new environment
    e <- new.env()
    # two environment variables, a and b
    e$a <- 1:10
    e$b <- 11:20
    # mean of the variables
    eapply(e, mean)
    $b
    [1] 15.5
    
    $a
    [1] 5.5

    3.lapply应用于列表,返回列表,实际data.frame也是一种list,一种由多个长度相同的向量cbind一起的list:lapply(list, function)

    sapply(iris[,1:4],mean)
    Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 
    
    lapply(iris[,1:4],mean)
    $Sepal.Length
    [1] 5.843333
    
    $Sepal.Width
    [1] 3.057333
    
    $Petal.Length
    [1] 3.758
    
    $Petal.Width
    [1] 1.199333

    4.sapply 是lapply的友好形式.lapply和sapply都可应用于list,data.frame。只是返回的对象类型不一样,前者是list,后者看情况,如果是每一个list下面的元素长度都一样,返回的结果就会被就会简化。举例说明。

    # 下面两个返回的结果是一样一样的,都是list
    sapply(iris,unique)
    lapply(iris,unique)
    
    # 下面两个前者返回向量,后者返回list
    sapply(iris[,1:4],mean)
    lapply(iris[,1:4],mean)
    
    #下面两个前者返回data.frame,后者反回list
    sapply(iris[,1:4], function(x) x/2)
    lapply(iris[,1:4], function(x) x/2)
    
    # sapply会根据返回结果,选最合适的对象类型来存放对象,而list反悔的统统都是list
    # 以下两者返回结果一样
    library(magrittr)
    lapply(iris[,1:4],mean)%>%unlist()
    sapply(iris[,1:4],mean)

     5.vapply要求提供第三个参数,即输出的格式

    l <- list(a = 1:10, b = 11:20)
    # fivenum of values using vapply
    l.fivenum <- vapply(l, fivenum, c(Min.=0, "1st Qu."=0, Median=0, "3rd Qu."=0, Max.=0))
    class(l.fivenum)
    [1] "matrix"
    # let's see it
    l.fivenum
               a    b
    Min.     1.0 11.0
    1st Qu.  3.0 13.0
    Median   5.5 15.5
    3rd Qu.  8.0 18.0
    Max.    10.0 20.0
    

    6.replicate
    Description: “replicate is a wrapper for the common use of sapply for repeated evaluation of an expression (which will usually involve random number generation).”

    replicate(10, rnorm(10))
    

    7.mapply可传递多个参数进去.

    mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.

    l1 <- list(a = c(1:10), b = c(11:20))
    l2 <- list(c = c(21:30), d = c(31:40))
    # sum the corresponding elements of l1 and l2
    mapply(sum, l1$a, l1$b, l2$c, l2$d)
    [1]  64  68  72  76  80  84  88  92  96 100
    #mapply像是可以传递多个参数的saply
    mapply(rep, 1:4, 5)
    [,1] [,2] [,3] [,4]
    [1,]    1    2    3    4
    [2,]    1    2    3    4
    [3,]    1    2    3    4
    [4,]    1    2    3    4
    [5,]    1    2    3    4
    

     8.rapply
    Description: “rapply is a recursive version of lapply.”

    # let's start with our usual simple list example
    l <- list(a = 1:10, b = 11:20)
    # log2 of each value in the list
    rapply(l, log2)
          a1       a2       a3       a4       a5       a6       a7       a8 
    0.000000 1.000000 1.584963 2.000000 2.321928 2.584963 2.807355 3.000000 
          a9      a10       b1       b2       b3       b4       b5       b6 
    3.169925 3.321928 3.459432 3.584963 3.700440 3.807355 3.906891 4.000000 
          b7       b8       b9      b10 
    4.087463 4.169925 4.247928 4.321928
    # log2 of each value in each list
    rapply(l, log2, how = "list")
    $a
     [1] 0.000000 1.000000 1.584963 2.000000 2.321928 2.584963 2.807355 3.000000
     [9] 3.169925 3.321928
     
    $b
     [1] 3.459432 3.584963 3.700440 3.807355 3.906891 4.000000 4.087463 4.169925
     [9] 4.247928 4.321928
     
    # what if the function is the mean?
    rapply(l, mean)
       a    b 
     5.5 15.5
     
    rapply(l, mean, how = "list")
    $a
    [1] 5.5
     
    $b
    [1] 15.5
    

    二.多线程计算

    下面用欧拉问题14,来演示R中的向量化编程(利用apply组函数)和多线程

    #-----Longest Collatz sequence Problem 14
    func <- function(x) {
        n = 1
        raw <- x
        while (x > 1) {
            x <- ifelse(x%%2==0,x/2,3*x+1)
            n = n + 1
        }
        return(c(raw,n))
    }
    
    #方法1 向量化编程
    library(magrittr)
    system.time({
        x <- 1:1e5
        res1 <- sapply(x, func)%>%t()
    })
    
    用户   系统   流逝 
    37.960  0.360 41.315
    
    #方法2 向量化编程
    system.time({
        x <- 1:1e5
        res2 <- do.call('rbind',lapply(x,func))
    })
    
    用户   系统   流逝 
    36.031  0.181 36.769
    
    #方法3 多线程计算
    library(parallel)
    # 用system.time来返回计算所需时间
    system.time({
        x <- 1:1e5
        cl <- makeCluster(4) # 初始化四核心集群
        results <- parLapply(cl,x,func) # lapply的并行版本
        res.df <- do.call('rbind',results) # 整合结果
        stopCluster(cl) # 关闭集群
    })
    
    用户   系统   流逝 
    0.199  0.064 20.038 
    
    # 方法4 for 循环
    system.time({
        m <- matrix(nrow = 0,ncol = 2)
        for(i in 1:1e5){
            m <- rbind(m,func(i))
        }
    })
    
    #方法4用时太长
    

    以上。 

    参考:

    A brief introduction to “apply” in R

    用Parallel和foreach包玩转并行计算 

  • 相关阅读:
    编程题#2: 魔兽世界之二:装备
    程序设计实习MOOC / 继承和派生——编程作业 第五周程序填空题1
    【转】C++动态创建二维数组,二维数组指针
    HDU-2571命运
    HDU-1203 I NEED A OFFER!
    HDU-1003 Max Sum
    HDU2196-Computer
    HDU-1520 Anniversary party
    ChineseHelper(获取汉字字符串的首拼)
    车牌号正则表达式(新能源车牌)
  • 原文地址:https://www.cnblogs.com/litao1105/p/5573373.html
Copyright © 2011-2022 走看看