zoukankan      html  css  js  c++  java
  • 【R读取报错】解决: Can't bind data because some arguments have the same name

    最近读取一个数据时,报如标题的错误。

    args[1] <- "RT_10-VS-RT_0"
    all <- read.delim(paste0(args[1],".xls"),header = T,check.names = F) 
    dat <- all %>% dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence) 
    

    这是因为select函数对于有重复列名的数据框,选择不了。(即使不选择重复的列也会报此错误)。

    可以用以下脚本查下重复的列名:

    #检查重复列名
    > tibble::enframe(names(all)) %>% count(value) %>% filter(n > 1)
    # A tibble: 1 x 2
      value          n
      <chr>      <int>
    1 Protein_ID     2
    

    发现有两个Protein_ID的列。

    如何解决呢?可改用readr读取,会智能解析。

    all <- readr::read_delim(paste0(args[1],".xls"),delim = "	") %>% 
      dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)
    
    Parsed with column specification:
    cols(
      .default = col_character(),
      No. = col_double(),
      Mass = col_double(),
      Protein_Coverage = col_double(),
      `Mean_Ratio_RT_10_118/RT_0_117` = col_double(),
      `Tremble Identity` = col_double(),
      `Tremble E-value` = col_double()
    )
    See spec(...) for full column specifications.
    Warning: 29 parsing failures.
     row                           col expected actual                file
    1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
    1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    .... ............................. ........ ...... ...................
    See problems(...) for more details.
    
    Warning message:
    Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]
    

    警告中也有提示解析(按默认解析方式col_double)失败的列和行,提示了重复列Protein_ID。怎么去掉长长的Parsed with column specification信息呢,我们可以指定读入时列名解析类型,或使用默认参数col_types = cols()

    all <- readr::read_delim(paste0(args[1],".xls"),delim = "	",col_types = cols()) %>% 
      dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)  
    
    Warning: 29 parsing failures.
     row                           col expected actual                file
    1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
    1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    .... ............................. ........ ...... ...................
    See problems(...) for more details.
    
    Warning message:
    Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14] 
    

    警告信息还在,最好保留。

    Ref:https://github.com/tidyverse/readr/issues/954

  • 相关阅读:
    尘误解
    了解了解你自己的话zookeeper(从那时起,纠正了一些说法在线)
    HDU 5055 Bob and math problem(结构体)
    Linux通过编辑器vi使用介绍
    OCP-1Z0-051-名称解析-文章32称号
    刘强东:解密京东10甘蔗理论
    Android结构分析Android智能指针(两)
    hbase ganglia监控配置
    第一个位和一个真正的项目件
    Html5 の 微信飞机大战
  • 原文地址:https://www.cnblogs.com/jessepeng/p/12452211.html
Copyright © 2011-2022 走看看