zoukankan html css js c++ java

Linux 去重先sort再uniq

从uniq命令的帮助信息中可以看到，该命令只过滤相邻的重复行．

如果要去掉所有重复行，需要先排序，或者使用uniq -u

$ uniq --h
Usage: uniq [OPTION]... [INPUT [OUTPUT]]
Filter adjacent matching lines from INPUT (or standard input),
writing to OUTPUT (or standard output).

With no options, matching lines are merged to the first occurrence.

Mandatory arguments to long options are mandatory for short options too.
  -c, --count           prefix lines by the number of occurrences
  -d, --repeated        only print duplicate lines, one for each group
  -D                    print all duplicate lines
      --all-repeated[=METHOD]  like -D, but allow separating groups
                                 with an empty line;
                                 METHOD={none(default),prepend,separate}
  -f, --skip-fields=N   avoid comparing the first N fields
      --group[=METHOD]  show all items, separating groups with an empty line;
                          METHOD={separate(default),prepend,append,both}
  -i, --ignore-case     ignore differences in case when comparing
  -s, --skip-chars=N    avoid comparing the first N characters
  -u, --unique          only print unique lines
  -z, --zero-terminated     line delimiter is NUL, not newline
  -w, --check-chars=N   compare no more than N characters in lines
      --help     display this help and exit
      --version  output version information and exit

A field is a run of blanks (usually spaces and/or TABs), then non-blank
characters.  Fields are skipped before chars.

Note: 'uniq' does not detect repeated lines unless they are adjacent.
You may want to sort the input first, or use 'sort -u' without 'uniq'.
Also, comparisons honor the rules specified by 'LC_COLLATE'.

$ cat tmp.txt
aa
aa
bb
bb
bb
cc
cc
aa
cc
bb
$ cat tmp.txt | uniq
aa
bb
cc
aa
cc
bb

先sort再uniq可以去除所有重复项：

$ cat tmp.txt | sort | uniq
aa
bb
cc

或者使用uniq -u：

$ cat tmp.txt | uniq -u
aa
cc
bb

但是这种方法不一定起效（参考下面的例子）

$ head info.json -n20 | jq .industry | awk -F '"' '{print $2}' | awk '{if (length > 0) print $0}' | uniq | sort # ==> 没有去重
商务服务业
建筑装饰和其他建筑业
批发业
批发业
批发业
机动车、电子产品和日用产品修理业
研究和试验发展
纺织服装、服饰业
计算机、通信和其他电子设备制造业
软件和信息技术服务业
道路运输业
零售业
$ head info.json -n20 | jq .industry | awk -F '"' '{print $2}' | awk '{if (length > 0) print $0}' | uniq -u | sort  # ==> 去重不完全
商务服务业
建筑装饰和其他建筑业
批发业
批发业
机动车、电子产品和日用产品修理业
研究和试验发展
纺织服装、服饰业
计算机、通信和其他电子设备制造业
软件和信息技术服务业
道路运输业
零售业
$ head info.json -n20 | jq .industry | awk -F '"' '{print $2}' | awk '{if (length > 0) print $0}' | sort | uniq  # ==> 去重成功
商务服务业
建筑装饰和其他建筑业
批发业
机动车、电子产品和日用产品修理业
研究和试验发展
纺织服装、服饰业
计算机、通信和其他电子设备制造业
软件和信息技术服务业
道路运输业
零售业

查看全文

相关阅读:
MutationObserver DOM变化的观察
 lspci详解分析
 dpdk快速编译使用
 bonding的系统初始化介绍
 fio测试nvme性能
 [驱动] 一个简单内核驱动，通过qemu调试(1)
qemu启动vm后，如何host上使用ssh连接？
Linux C下变量和常量的存储的本质
 从计算机中数据类型的存储方式，思考理解原码，反码，补码
 Linux C动态链接库实现一个插件例子

原文地址：https://www.cnblogs.com/bymo/p/7601047.html

热门文章
25 个超棒的 HTML5 & JavaScript 游戏引擎开发库
 TCP/IP 笔记
 TCP/IP 笔记
 TCP/IP 笔记
 TCP/IP 笔记
 TCP/IP 笔记
 TCP/IP 笔记
 TCP/IP 笔记
 TCP/IP 笔记
 TCP/IP 笔记

Linux 去重 先sort再uniq

Linux 去重先sort再uniq