zoukankan      html  css  js  c++  java
  • R&SQL合并记录并去重

    ====================================================================================

    data:

    have <- data.frame(ID=paste0("id", c(101, 102, 102, 103, 103, 104, 105, 105)),
    info1=c("one", "twoA", "twoB", "threeA", "threeB", "four", "five", "five"),
    stringsAsFactors=FALSE)
    want <- data.frame(ID=paste0("id", c(101:105)),
    info1=c("one", "twoA; twoB", "threeA; threeB", "four", "five"),
    stringsAsFactors=FALSE)

    ====================================================================================

     MySQL

    SELECT
        user_id,
        COUNT(id) AS freq,
        GROUP_CONCAT(DISTINCT ip SEPARATOR ",") AS ips
    FROM
        log_table
    GROUP BY
        user_id
    ORDER BY
        freq ASC;

    ====================================================================================

     R&SQL

    library(sqldf)


    sqldf("
    SELECT ID,
    GROUP_CONCAT(distinct info1) as info1
    FROM have
    GROUP BY ID")

    注:此种情况没能实现更改间隔符,默认为逗号。

    ====================================================================================

     R

    --------------------------------------------------------------------------------------------------------------------------------------

    require(dplyr)

    METHOD1:  have %>%    group_by(ID) %>%    summarise_each(funs(toString(sort(unique(info1)))))

    METHOD2:  have %>%    group_by(ID) %>%    summarise(name = toString(sort(unique(info1))))

    --------------------------------------------------------------------------------------------------------------------------------------

    require(data.table)

    METHOD1:  setDT(have)[, .(info1 = toString(sort(unique(info1)))), by = ID]
    METHOD2:  setDT(have)[ , .(info1 = paste(unique(info1), collapse = ",")),by = ID]

    注:paste函数作为其他函数的参数时,不能。例如

    aggregate(have[,2], by=list(have$ID), paste(unique(info1)), collapse=";")   × 

    aggregate(have[,2], by=list(have$ID), paste, collapse=";")  

    setDT(have)[, lapply(.SD, paste(unique(info1)), collapse = "; "), by = ID]  ×

    setDT(have)[, lapply(.SD, paste, collapse = "; "), by = ID]  √

    --------------------------------------------------------------------------------------------------------------------------------------

    aggregate(data=have,info1~ID,FUN = function(t) sort(unique(t)))

  • 相关阅读:
    获得随机N位数不重复数字
    springboot配置swagger
    Java开发中的23种设计模式详解(转)
    英文对应解释
    python实现ping命令语音播报联通性V1.0.0.1
    python实现ping命令语音播报联通性V1.0
    python实现ping命令语音播报联通性
    eclipse更新支持tomcat-8.0以上
    网管---------Windows杀死后台进程查询等脚本
    网管-windows磁盘检查修复
  • 原文地址:https://www.cnblogs.com/Emily07/p/6509362.html
Copyright © 2011-2022 走看看