zoukankan html css js c++ java

Bash：常用命令工具-uniq

NAME
       uniq - report or omit repeated lines

SYNOPSIS
       uniq [OPTION]... [INPUT [OUTPUT]]

DESCRIPTION
       Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).

       With no options, matching lines are merged to the first occurrence.

       Mandatory arguments to long options are mandatory for short options too.

       -c, --count
              prefix lines by the number of occurrences

       -d, --repeated
              only print duplicate lines

       -D, --all-repeated[=delimit-method]
              print all duplicate lines delimit-method={none(default),prepend,separate} Delimiting is done with blank lines

       -f, --skip-fields=N
              avoid comparing the first N fields

       -i, --ignore-case
              ignore differences in case when comparing

       -s, --skip-chars=N
              avoid comparing the first N characters

       -u, --unique
              only print unique lines

       -z, --zero-terminated
              end lines with 0 byte, not newline

       -w, --check-chars=N
              compare no more than N characters in lines

       --help display this help and exit

       --version
              output version information and exit

       A field is a run of blanks (usually spaces and/or TABs), then non-blank characters.  Fields are skipped before chars.

       Note:  'uniq'  does not detect repeated lines unless they are adjacent.  You may want to sort the input first, or use 'sort -u' without 'uniq'.  Also, comparisons honor the rules specified by 'LC_COL‐
       LATE'.

以上是man输出。

从最后的note中可以知道当使用uniq进行去重，要求输入重复项是相邻的。这个比较好理解，要求重复项时连续的话可以省去一个hashmap的空间来做统计。为了获得这样的一个输入，可以先对数据进行一个排序操作，这样重复项必然是连续相邻的。

有如下文本文件：

the
day
is
sunny
the
the
sunny
day
is
today
is
sunny
day

UASE CASE 1.

首先对单词内容做一个去重处理（先排序，再去重）

$ sort words.txt | uniq
day
is
sunny
the
today

USE CASE 2.

重复统计：

$ sort words.txt | uniq -c
      3 day
      3 is
      3 sunny
      3 the
      1 today

USE CASE 3.

只输出重复项或者只输出唯一项：

$ sort words.txt | uniq -d
day
is
sunny
the
$ sort words.txt | uniq -u
today

查看全文

相关阅读:
痞子衡嵌入式：恩智浦i.MX RTxxx系列MCU特性那些事（1）- 概览
 痞子衡嵌入式：16MB以上NOR Flash使用不当可能会造成软复位后i.MXRT无法正常启动
 《痞子衡嵌入式半月刊》第 12 期
 不能错过的分布式ID生成器（Leaf ），好用的一批！
实用！一键生成数据库文档，堪称数据库界的Swagger
安排上了！PC人脸识别登录，出乎意料的简单
 又被逼着优化代码，这次我干掉了出入参 Log日志
 图文并茂，带你认识 JVM 运行时数据区
 一文说通C#中的异步编程补遗
 一文说通C#中的异步编程

原文地址：https://www.cnblogs.com/lailailai/p/4432579.html