zoukankan      html  css  js  c++  java
  • Lesser known dplyr tricks

    In this blog post I share some lesser-known (at least I believe they are) tricks that use mainly functions from dplyr.

    Removing unneeded columns

    Did you know that you can use - in front of a column name to remove it from a data frame?

    mtcars %>% 
        select(-disp) %>% 
        head()
    ##                    mpg cyl  hp drat    wt  qsec vs am gear carb
    ## Mazda RX4         21.0   6 110 3.90 2.620 16.46  0  1    4    4
    ## Mazda RX4 Wag     21.0   6 110 3.90 2.875 17.02  0  1    4    4
    ## Datsun 710        22.8   4  93 3.85 2.320 18.61  1  1    4    1
    ## Hornet 4 Drive    21.4   6 110 3.08 3.215 19.44  1  0    3    1
    ## Hornet Sportabout 18.7   8 175 3.15 3.440 17.02  0  0    3    2
    ## Valiant           18.1   6 105 2.76 3.460 20.22  1  0    3    1

    Re-ordering columns

    Still using select(), it is easy te re-order columns in your data frame:

    mtcars %>% 
        select(cyl, disp, hp, everything()) %>% 
        head()
    ##                   cyl disp  hp  mpg drat    wt  qsec vs am gear carb
    ## Mazda RX4           6  160 110 21.0 3.90 2.620 16.46  0  1    4    4
    ## Mazda RX4 Wag       6  160 110 21.0 3.90 2.875 17.02  0  1    4    4
    ## Datsun 710          4  108  93 22.8 3.85 2.320 18.61  1  1    4    1
    ## Hornet 4 Drive      6  258 110 21.4 3.08 3.215 19.44  1  0    3    1
    ## Hornet Sportabout   8  360 175 18.7 3.15 3.440 17.02  0  0    3    2
    ## Valiant             6  225 105 18.1 2.76 3.460 20.22  1  0    3    1

    As its name implies everything() simply means all the other columns.

    Renaming columns with rename()

    mtcars <- rename(mtcars, spam_mpg = mpg)
    mtcars <- rename(mtcars, spam_disp = disp)
    mtcars <- rename(mtcars, spam_hp = hp)
    
    head(mtcars)
    ##                   spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am
    ## Mazda RX4             21.0   6       160     110 3.90 2.620 16.46  0  1
    ## Mazda RX4 Wag         21.0   6       160     110 3.90 2.875 17.02  0  1
    ## Datsun 710            22.8   4       108      93 3.85 2.320 18.61  1  1
    ## Hornet 4 Drive        21.4   6       258     110 3.08 3.215 19.44  1  0
    ## Hornet Sportabout     18.7   8       360     175 3.15 3.440 17.02  0  0
    ## Valiant               18.1   6       225     105 2.76 3.460 20.22  1  0
    ##                   gear carb
    ## Mazda RX4            4    4
    ## Mazda RX4 Wag        4    4
    ## Datsun 710           4    1
    ## Hornet 4 Drive       3    1
    ## Hornet Sportabout    3    2
    ## Valiant              3    1

    Selecting columns with a regexp

    It is easy to select the columns that start with “spam” with some helper functions:

    mtcars %>% 
        select(contains("spam")) %>% 
        head()
    ##                   spam_mpg spam_disp spam_hp
    ## Mazda RX4             21.0       160     110
    ## Mazda RX4 Wag         21.0       160     110
    ## Datsun 710            22.8       108      93
    ## Hornet 4 Drive        21.4       258     110
    ## Hornet Sportabout     18.7       360     175
    ## Valiant               18.1       225     105

    take also a look at starts_with()ends_with()contains()matches()num_range()one_of() and everything().

    Create new columns with mutate() and if_else()

    mtcars %>% 
        mutate(vs_new = if_else(
            vs == 1, 
            "one", 
            "zero", 
            NA_character_)) %>% 
        head()
    ##   spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am gear carb vs_new
    ## 1     21.0   6       160     110 3.90 2.620 16.46  0  1    4    4   zero
    ## 2     21.0   6       160     110 3.90 2.875 17.02  0  1    4    4   zero
    ## 3     22.8   4       108      93 3.85 2.320 18.61  1  1    4    1    one
    ## 4     21.4   6       258     110 3.08 3.215 19.44  1  0    3    1    one
    ## 5     18.7   8       360     175 3.15 3.440 17.02  0  0    3    2   zero
    ## 6     18.1   6       225     105 2.76 3.460 20.22  1  0    3    1    one

    You might want to create a new variable conditionally on several values of another column:

    mtcars %>% 
        mutate(carb_new = case_when(.$carb == 1 ~ "one",
                                    .$carb == 2 ~ "two",
                                    .$carb == 4 ~ "four",
                                     TRUE ~ "other")) %>% 
        head(15)
    ##    spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am gear carb
    ## 1      21.0   6     160.0     110 3.90 2.620 16.46  0  1    4    4
    ## 2      21.0   6     160.0     110 3.90 2.875 17.02  0  1    4    4
    ## 3      22.8   4     108.0      93 3.85 2.320 18.61  1  1    4    1
    ## 4      21.4   6     258.0     110 3.08 3.215 19.44  1  0    3    1
    ## 5      18.7   8     360.0     175 3.15 3.440 17.02  0  0    3    2
    ## 6      18.1   6     225.0     105 2.76 3.460 20.22  1  0    3    1
    ## 7      14.3   8     360.0     245 3.21 3.570 15.84  0  0    3    4
    ## 8      24.4   4     146.7      62 3.69 3.190 20.00  1  0    4    2
    ## 9      22.8   4     140.8      95 3.92 3.150 22.90  1  0    4    2
    ## 10     19.2   6     167.6     123 3.92 3.440 18.30  1  0    4    4
    ## 11     17.8   6     167.6     123 3.92 3.440 18.90  1  0    4    4
    ## 12     16.4   8     275.8     180 3.07 4.070 17.40  0  0    3    3
    ## 13     17.3   8     275.8     180 3.07 3.730 17.60  0  0    3    3
    ## 14     15.2   8     275.8     180 3.07 3.780 18.00  0  0    3    3
    ## 15     10.4   8     472.0     205 2.93 5.250 17.98  0  0    3    4
    ##    carb_new
    ## 1      four
    ## 2      four
    ## 3       one
    ## 4       one
    ## 5       two
    ## 6       one
    ## 7      four
    ## 8       two
    ## 9       two
    ## 10     four
    ## 11     four
    ## 12    other
    ## 13    other
    ## 14    other
    ## 15     four

    Mind the .$ before the variable carb. There is a github issue about this, and it is already fixed in the development version of dplyr, which means that in the next version of dplyrcase_when() will work as any other specialized dplyr function inside mutate().

    Apply a function to certain columns only, by rows

    mtcars %>%
        select(am, gear, carb) %>%
        purrr::by_row(sum, .collate = "cols", .to = "sum_am_gear_carb") -> mtcars2
    head(mtcars2)
    ## # A tibble: 6 × 4
    ##      am  gear  carb sum_am_gear_carb
    ##   <dbl> <dbl> <dbl>            <dbl>
    ## 1     1     4     4                9
    ## 2     1     4     4                9
    ## 3     1     4     1                6
    ## 4     0     3     1                4
    ## 5     0     3     2                5
    ## 6     0     3     1                4

    For this, I had to use purrr’s by_row() function. You can then add this column to your original data frame:

    mtcars <- cbind(mtcars, "sum_am_gear_carb" = mtcars2$sum_am_gear_carb)
    head(mtcars)
    ##                   spam_mpg cyl spam_disp spam_hp drat    wt  qsec vs am
    ## Mazda RX4             21.0   6       160     110 3.90 2.620 16.46  0  1
    ## Mazda RX4 Wag         21.0   6       160     110 3.90 2.875 17.02  0  1
    ## Datsun 710            22.8   4       108      93 3.85 2.320 18.61  1  1
    ## Hornet 4 Drive        21.4   6       258     110 3.08 3.215 19.44  1  0
    ## Hornet Sportabout     18.7   8       360     175 3.15 3.440 17.02  0  0
    ## Valiant               18.1   6       225     105 2.76 3.460 20.22  1  0
    ##                   gear carb sum_am_gear_carb
    ## Mazda RX4            4    4                9
    ## Mazda RX4 Wag        4    4                9
    ## Datsun 710           4    1                6
    ## Hornet 4 Drive       3    1                4
    ## Hornet Sportabout    3    2                5
    ## Valiant              3    1                4

    Use do() to do any arbitrary operation

    mtcars %>% 
        group_by(cyl) %>% 
        do(models = lm(spam_mpg ~ drat + wt, data = .)) %>% 
        broom::tidy(models)
    ## Source: local data frame [9 x 6]
    ## Groups: cyl [3]
    ## 
    ##     cyl        term   estimate  std.error  statistic     p.value
    ##   <dbl>       <chr>      <dbl>      <dbl>      <dbl>       <dbl>
    ## 1     4 (Intercept) 33.2493403 17.0987286  1.9445504 0.087727622
    ## 2     4        drat  1.3244329  3.4519717  0.3836743 0.711215433
    ## 3     4          wt -5.2400608  2.2150213 -2.3656932 0.045551615
    ## 4     6 (Intercept) 30.6544931  7.5141648  4.0795609 0.015103868
    ## 5     6        drat -0.4435744  1.1740862 -0.3778039 0.724768945
    ## 6     6          wt -2.9902720  1.5685053 -1.9064468 0.129274249
    ## 7     8 (Intercept) 29.6519180  7.0878976  4.1834574 0.001527613
    ## 8     8        drat -1.4698722  1.6285054 -0.9025897 0.386081744
    ## 9     8          wt -2.4518017  0.7985112 -3.0704664 0.010651044

    do() is useful when you want to use any R function (user defined functions work too!) with dplyr functions. First I grouped the observations by cyl and then ran a linear model for each group. Then I converted the output to a tidy data frame usingbroom::tidy().

    Using dplyr() functions inside your own functions

    extract_vars <- function(data, some_string){
        
      data %>%
        select_(lazyeval::interp(~contains(some_string))) -> data
        
      return(data)
    }
    
    extract_vars(mtcars, "spam")
    ##                     spam_mpg spam_disp spam_hp
    ## Mazda RX4               21.0     160.0     110
    ## Mazda RX4 Wag           21.0     160.0     110
    ## Datsun 710              22.8     108.0      93
    ## Hornet 4 Drive          21.4     258.0     110
    ## Hornet Sportabout       18.7     360.0     175
    ## Valiant                 18.1     225.0     105
    ## Duster 360              14.3     360.0     245
    ## Merc 240D               24.4     146.7      62
    ## Merc 230                22.8     140.8      95
    ## Merc 280                19.2     167.6     123
    ## Merc 280C               17.8     167.6     123
    ## Merc 450SE              16.4     275.8     180
    ## Merc 450SL              17.3     275.8     180
    ## Merc 450SLC             15.2     275.8     180
    ## Cadillac Fleetwood      10.4     472.0     205
    ## Lincoln Continental     10.4     460.0     215
    ## Chrysler Imperial       14.7     440.0     230
    ## Fiat 128                32.4      78.7      66
    ## Honda Civic             30.4      75.7      52
    ## Toyota Corolla          33.9      71.1      65
    ## Toyota Corona           21.5     120.1      97
    ## Dodge Challenger        15.5     318.0     150
    ## AMC Javelin             15.2     304.0     150
    ## Camaro Z28              13.3     350.0     245
    ## Pontiac Firebird        19.2     400.0     175
    ## Fiat X1-9               27.3      79.0      66
    ## Porsche 914-2           26.0     120.3      91
    ## Lotus Europa            30.4      95.1     113
    ## Ford Pantera L          15.8     351.0     264
    ## Ferrari Dino            19.7     145.0     175
    ## Maserati Bora           15.0     301.0     335
    ## Volvo 142E              21.4     121.0     109

    About this last point, you can read more about it here.

    Hope you liked this small list of tricks!

    转自:http://www.brodrigues.co/blog/2017-02-17-lesser_known_tricks/

  • 相关阅读:
    ElasticSearch(二) Transport Client Connection By Domain
    ElasticSearch(一) Transport Client
    如何把Spring Boot打包成war
    Lucene Query种类
    JAVA Http Basic auth
    Java 多线程系列2——多线程的生命周期及生产消费者模型
    Java 多线程系列1——浅聊JAVA 线程池的一般用法
    JS 实现右下角弹窗
    JS 实现兼容IE图片向左或向右翻转
    Vue Input输入框两侧加减框内数字组件
  • 原文地址:https://www.cnblogs.com/payton/p/6541521.html
Copyright © 2011-2022 走看看