zoukankan      html  css  js  c++  java
  • Is the “*apply” family really not vectorized?

    Question:

    So we are used to say to every R new user that "apply isn't vectorized, check out the Patrick Burns R Inferno Circle 4" which says (I quote):

    A common reflex is to use a function in the apply family. This is not vectorization, it is loop-hiding. The apply function has a for loop in its definition. The lapply function buries the loop, but execution times tend to be roughly equal to an explicit for loop.

    Indeed, a quick look on the apply source code reveals the loop:

    grep("for", capture.output(getAnywhere("apply")), value = TRUE)
    ## [1] "        for (i in 1L:d2) {"  "    else for (i in 1L:d2) {"

    Ok so far, but a look at lapply or vapply actually reveals a completely different picture:

    lapply
    ## function (X, FUN, ...) 
    ## {
    ##     FUN <- match.fun(FUN)
    ##     if (!is.vector(X) || is.object(X)) 
    ##        X <- as.list(X)
    ##     .Internal(lapply(X, FUN))
    ## }
    ## <bytecode: 0x000000000284b618>
    ## <environment: namespace:base>

    So apparently there is no R for loop hiding there, rather they are calling internal C written function.

    A quick look in the rabbit hole reveals pretty much the same picture

    Moreover, let's take the colMeans function for example, which was never accused in not being vectorised

    colMeans
    # function (x, na.rm = FALSE, dims = 1L) 
    # {
    #   if (is.data.frame(x)) 
    #     x <- as.matrix(x)
    #   if (!is.array(x) || length(dn <- dim(x)) < 2L) 
    #     stop("'x' must be an array of at least two dimensions")
    #   if (dims < 1L || dims > length(dn) - 1L) 
    #     stop("invalid 'dims'")
    #   n <- prod(dn[1L:dims])
    #   dn <- dn[-(1L:dims)]
    #   z <- if (is.complex(x)) 
    #     .Internal(colMeans(Re(x), n, prod(dn), na.rm)) + (0+1i) * 
    #     .Internal(colMeans(Im(x), n, prod(dn), na.rm))
    #   else .Internal(colMeans(x, n, prod(dn), na.rm))
    #   if (length(dn) > 1L) {
    #     dim(z) <- dn
    #     dimnames(z) <- dimnames(x)[-(1L:dims)]
    #   }
    #   else names(z) <- dimnames(x)[[dims + 1]]
    #   z
    # }
    # <bytecode: 0x0000000008f89d20>
    #   <environment: namespace:base>

    Huh? It also just calls .Internal(colMeans(... which we can also find in the rabbit hole. So how is this different from .Internal(lapply(..?

    Actually a quick benchmark reveals that sapply performs no worse than colMeans and much better than a for loop for a big data set

    m <- as.data.frame(matrix(1:1e7, ncol = 1e5))
    system.time(colMeans(m))
    # user  system elapsed 
    # 1.69    0.03    1.73 
    system.time(sapply(m, mean))
    # user  system elapsed 
    # 1.50    0.03    1.60 
    system.time(apply(m, 2, mean))
    # user  system elapsed 
    # 3.84    0.03    3.90 
    system.time(for(i in 1:ncol(m)) mean(m[, i]))
    # user  system elapsed 
    # 13.78    0.01   13.93 

    In other words, is it correct to say that lapply and vapply are actually vectorised (compared to apply which is a for loop that also calls lapply) and what did Patrick Burns really mean to say?



    Answer:

    First of all, in your example you make tests on a "data.frame" which is not fair for colMeans, apply and "[.data.frame" since they have an overhead:

    system.time(as.matrix(m))  #called by `colMeans` and `apply`
    #   user  system elapsed 
    #   1.03    0.00    1.05
    system.time(for(i in 1:ncol(m)) m[, i])  #in the `for` loop
    #   user  system elapsed 
    #  12.93    0.01   13.07

    On a matrix, the picture is a bit different:

    mm = as.matrix(m)
    system.time(colMeans(mm))
    #   user  system elapsed 
    #   0.01    0.00    0.01 
    system.time(apply(mm, 2, mean))
    #   user  system elapsed 
    #   1.48    0.03    1.53 
    system.time(for(i in 1:ncol(mm)) mean(mm[, i]))
    #   user  system elapsed 
    #   1.22    0.00    1.21

    Regading the main part of the question, the main difference between lapply/mapply/etc and straightforward R-loops is where the looping is done. As Roland notes, both C and R loops need to evaluate an R function in each iteration which is the most costly. The really fast C functions are those that do everything in C, so, I guess, this should be what "vectorised" is about? An example where we find the mean in each of a "list"s elements:

    #all computations in C
    all_C = inline::cfunction(sig = c(R_ls = "list"), body = '
        SEXP tmp, ans;
        PROTECT(ans = allocVector(REALSXP, LENGTH(R_ls)));
    
        double *ptmp, *pans = REAL(ans);
    
        for(int i = 0; i < LENGTH(R_ls); i++) {
            pans[i] = 0.0;
    
            PROTECT(tmp = coerceVector(VECTOR_ELT(R_ls, i), REALSXP));
            ptmp = REAL(tmp);
    
            for(int j = 0; j < LENGTH(tmp); j++) pans[i] += ptmp[j];
    
            pans[i] /= LENGTH(tmp);
    
            UNPROTECT(1);
        }
    
        UNPROTECT(1);
        return(ans);
    ')
    
    #a very simple `lapply(x, mean)`
    C_and_R = inline::cfunction(sig = c(R_ls = "list"), body = '
        SEXP call, ans, ret;
    
        PROTECT(call = allocList(2));
        SET_TYPEOF(call, LANGSXP);
        SETCAR(call, install("mean"));
    
        PROTECT(ans = allocVector(VECSXP, LENGTH(R_ls)));
        PROTECT(ret = allocVector(REALSXP, LENGTH(ans)));
    
        for(int i = 0; i < LENGTH(R_ls); i++) {
            SETCADR(call, VECTOR_ELT(R_ls, i));
            SET_VECTOR_ELT(ans, i, eval(call, R_GlobalEnv));
        }
    
        double *pret = REAL(ret);
        for(int i = 0; i < LENGTH(ans); i++) pret[i] = REAL(VECTOR_ELT(ans, i))[0];
    
        UNPROTECT(3);
        return(ret);
    ')                    
    
    R_lapply = function(x) unlist(lapply(x, mean))                       
    
    R_loop = function(x) 
    {
        ans = numeric(length(x))
        for(i in seq_along(x)) ans[i] = mean(x[[i]])
        return(ans)
    } 
    
    R_loopcmp = compiler::cmpfun(R_loop)
    
    
    set.seed(007); myls = replicate(1e4, runif(1e3), simplify = FALSE)
    all.equal(all_C(myls), C_and_R(myls))
    #[1] TRUE
    all.equal(all_C(myls), R_lapply(myls))
    #[1] TRUE
    all.equal(all_C(myls), R_loop(myls))
    #[1] TRUE
    all.equal(all_C(myls), R_loopcmp(myls))
    #[1] TRUE
    
    microbenchmark::microbenchmark(all_C(myls), 
                                   C_and_R(myls), 
                                   R_lapply(myls), 
                                   R_loop(myls), 
                                   R_loopcmp(myls), 
                                   times = 15)
    #Unit: milliseconds
    #            expr       min        lq    median        uq      max neval
    #     all_C(myls)  37.29183  38.19107  38.69359  39.58083  41.3861    15
    #   C_and_R(myls) 117.21457 123.22044 124.58148 130.85513 169.6822    15
    #  R_lapply(myls)  98.48009 103.80717 106.55519 109.54890 116.3150    15
    #    R_loop(myls) 122.40367 130.85061 132.61378 138.53664 178.5128    15
    # R_loopcmp(myls) 105.63228 111.38340 112.16781 115.68909 128.1976    15

  • 相关阅读:
    hdu 5387 Clock (模拟)
    CodeForces 300B Coach (并查集)
    hdu 3342 Legal or Not(拓扑排序)
    hdu 3853 LOOPS(概率DP)
    hdu 3076 ssworld VS DDD(概率dp)
    csu 1120 病毒(LICS 最长公共上升子序列)
    csu 1110 RMQ with Shifts (线段树单点更新)
    poj 1458 Common Subsequence(最大公共子序列)
    poj 2456 Aggressive cows (二分)
    HDU 1869 六度分离(floyd)
  • 原文地址:https://www.cnblogs.com/vigorz/p/10499222.html
Copyright © 2011-2022 走看看