zoukankan      html  css  js  c++  java
  • Bash:常用命令工具-uniq

    NAME
           uniq - report or omit repeated lines
    
    SYNOPSIS
           uniq [OPTION]... [INPUT [OUTPUT]]
    
    DESCRIPTION
           Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).
    
           With no options, matching lines are merged to the first occurrence.
    
           Mandatory arguments to long options are mandatory for short options too.
    
           -c, --count
                  prefix lines by the number of occurrences
    
           -d, --repeated
                  only print duplicate lines
    
           -D, --all-repeated[=delimit-method]
                  print all duplicate lines delimit-method={none(default),prepend,separate} Delimiting is done with blank lines
    
           -f, --skip-fields=N
                  avoid comparing the first N fields
    
           -i, --ignore-case
                  ignore differences in case when comparing
    
           -s, --skip-chars=N
                  avoid comparing the first N characters
    
           -u, --unique
                  only print unique lines
    
           -z, --zero-terminated
                  end lines with 0 byte, not newline
    
           -w, --check-chars=N
                  compare no more than N characters in lines
    
           --help display this help and exit
    
           --version
                  output version information and exit
    
           A field is a run of blanks (usually spaces and/or TABs), then non-blank characters.  Fields are skipped before chars.
    
           Note:  'uniq'  does not detect repeated lines unless they are adjacent.  You may want to sort the input first, or use 'sort -u' without 'uniq'.  Also, comparisons honor the rules specified by 'LC_COL‐
           LATE'.

    以上是man输出。

    从最后的note中可以知道当使用uniq进行去重,要求输入重复项是相邻的。这个比较好理解,要求重复项时连续的话可以省去一个hashmap的空间来做统计。为了获得这样的一个输入,可以先对数据进行一个排序操作,这样重复项必然是连续相邻的。

    有如下文本文件:

    the
    day
    is
    sunny
    the
    the
    sunny
    day
    is
    today
    is
    sunny
    day
    UASE CASE 1.

    首先对单词内容做一个去重处理(先排序,再去重)

    $ sort words.txt | uniq
    day
    is
    sunny
    the
    today
    USE CASE 2.

    重复统计:

    $ sort words.txt | uniq -c
          3 day
          3 is
          3 sunny
          3 the
          1 today

    USE CASE 3.

    只输出重复项或者只输出唯一项:

    $ sort words.txt | uniq -d
    day
    is
    sunny
    the
    $ sort words.txt | uniq -u
    today
  • 相关阅读:
    敏捷开发(五)- 框架SCRUM内容
    敏捷开发(四)- 故事验收测试
    敏捷开发(三)- 估算故事
    敏捷开发(二)- 编写故事
    敏捷开发(一)- 搜集故事
    项目管理(十)- 开发准备列表
    项目管理(九)- 组织项目资源
    web 前端常用组件【04】Datetimepicker 和 Lodop
    让时间处理简单化 【第三方扩展类库org.apache.commons.lang.time】
    Word 打包 zip 并提供下载
  • 原文地址:https://www.cnblogs.com/lailailai/p/4432579.html
Copyright © 2011-2022 走看看