zoukankan      html  css  js  c++  java
  • [Bash]LeetCode192. 统计词频 | Word Frequency

    ★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
    ➤微信公众号:山青咏芝(shanqingyongzhi)
    ➤博客园地址:山青咏芝(https://www.cnblogs.com/strengthen/
    ➤GitHub地址:https://github.com/strengthen/LeetCode
    ➤原文地址:https://www.cnblogs.com/strengthen/p/10180228.html 
    ➤如果链接不是山青咏芝的博客园地址,则可能是爬取作者的文章。
    ➤原文已修改更新!强烈建议点击原文地址阅读!支持作者!支持原创!
    ★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★

    Write a bash script to calculate the frequency of each word in a text file words.txt.

    For simplicity sake, you may assume:

    • words.txt contains only lowercase characters and space ' ' characters.
    • Each word must consist of lowercase characters only.
    • Words are separated by one or more whitespace characters.

    Example:

    Assume that words.txt has the following content:

    the day is sunny the the
    the sunny is is
    

    Your script should output the following, sorted by descending frequency:

    the 4
    is 3
    sunny 2
    day 1
    

    Note:

    • Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
    • Could you write it in one-line using Unix pipes?

    写一个 bash 脚本以统计一个文本文件 words.txt 中每个单词出现的频率。

    为了简单起见,你可以假设:

    • words.txt只包括小写字母和 ' ' 。
    • 每个单词只由小写字母组成。
    • 单词间由一个或多个空格字符分隔。

    示例:

    假设 words.txt 内容如下:

    the day is sunny the the
    the sunny is is
    

    你的脚本应当输出(以词频降序排列):

    the 4
    is 3
    sunny 2
    day 1
    

    说明:

    • 不要担心词频相同的单词的排序问题,每个单词出现的频率都是唯一的。
    • 你可以使用一行 Unix pipes 实现吗?

    4ms

    1 # Read from the file words.txt and output the word frequency list to stdout.
    2 cat words.txt | tr -s ' ' '
    ' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'

    8ms

    1 # Read from the file words.txt and output the word frequency list to stdout.
    2 awk '{
    3     for (i = 1; i <= NF; ++i) ++s[$i];
    4 } END {
    5     for (i in s) print i, s[i];
    6 }' words.txt | sort -nr -k 2

    16ms

    1 # Read from the file words.txt and output the word frequency list to stdout.
    2 
    3 # try 1
    4 sed 's/ {1,}/
    /g' words.txt | sed '/^$/d' | sort | uniq -c | sort -nr | awk '{print $2,$1}'
  • 相关阅读:
    【宋红康程序思想学习日记5】数组排序之冒泡法
    求割点 poj 1523
    网络流 poj 3308 最小割
    网络流最小割 POJ 3469
    网络流 POJ2112
    网络流 HDU 3605
    网络流HDU 2883
    网络流 最大流HDU 3549
    微信公众平台开发者中心服务器配置Token验证失败问题
    排列熵算法简介及c#实现
  • 原文地址:https://www.cnblogs.com/strengthen/p/10180228.html
Copyright © 2011-2022 走看看