zoukankan      html  css  js  c++  java
  • 【R读取报错】解决: Can't bind data because some arguments have the same name

    最近读取一个数据时,报如标题的错误。

    args[1] <- "RT_10-VS-RT_0"
    all <- read.delim(paste0(args[1],".xls"),header = T,check.names = F) 
    dat <- all %>% dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence) 
    

    这是因为select函数对于有重复列名的数据框,选择不了。(即使不选择重复的列也会报此错误)。

    可以用以下脚本查下重复的列名:

    #检查重复列名
    > tibble::enframe(names(all)) %>% count(value) %>% filter(n > 1)
    # A tibble: 1 x 2
      value          n
      <chr>      <int>
    1 Protein_ID     2
    

    发现有两个Protein_ID的列。

    如何解决呢?可改用readr读取,会智能解析。

    all <- readr::read_delim(paste0(args[1],".xls"),delim = "	") %>% 
      dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)
    
    Parsed with column specification:
    cols(
      .default = col_character(),
      No. = col_double(),
      Mass = col_double(),
      Protein_Coverage = col_double(),
      `Mean_Ratio_RT_10_118/RT_0_117` = col_double(),
      `Tremble Identity` = col_double(),
      `Tremble E-value` = col_double()
    )
    See spec(...) for full column specifications.
    Warning: 29 parsing failures.
     row                           col expected actual                file
    1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
    1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    .... ............................. ........ ...... ...................
    See problems(...) for more details.
    
    Warning message:
    Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]
    

    警告中也有提示解析(按默认解析方式col_double)失败的列和行,提示了重复列Protein_ID。怎么去掉长长的Parsed with column specification信息呢,我们可以指定读入时列名解析类型,或使用默认参数col_types = cols()

    all <- readr::read_delim(paste0(args[1],".xls"),delim = "	",col_types = cols()) %>% 
      dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)  
    
    Warning: 29 parsing failures.
     row                           col expected actual                file
    1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
    1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    .... ............................. ........ ...... ...................
    See problems(...) for more details.
    
    Warning message:
    Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14] 
    

    警告信息还在,最好保留。

    Ref:https://github.com/tidyverse/readr/issues/954

  • 相关阅读:
    2017ccpc全国邀请赛(湖南湘潭) E. Partial Sum
    Codeforces Round #412 C. Success Rate (rated, Div. 2, base on VK Cup 2017 Round 3)
    2017 中国大学生程序设计竞赛 女生专场 Building Shops (hdu6024)
    51nod 1084 矩阵取数问题 V2
    Power收集
    红色的幻想乡
    Koishi Loves Segments
    Wood Processing
    整数对
    Room and Moor
  • 原文地址:https://www.cnblogs.com/jessepeng/p/12452211.html
Copyright © 2011-2022 走看看