zoukankan      html  css  js  c++  java
  • 统计文件中单词的个数---Shell及python版

    最近在看shell中有个题目为统计单词的个数,使用了awk功能,代码如下

    #!/bin/bash
    if [ $# -ne 1 ];then
    echo "Usage:basename $0 filename"
    exit 1
    fi
    
    
    filename=$1
    egrep -o "[a-zA-Z]+" $filename |
    awk '{count[$0]++}
    END{printf "%-14s %s
    ","Word","Count"
    for(i in count)printf "%-14s %s
    ",i,count[i]|"sort -nrk 2"}'

    使用正则来匹配,+表示1个多个

    结果如下:

    [root@localhost shellcookbook]# sh word_freq.sh item.txt 
    Word           Count
    Tennis         2
    Sports         2
    Racket         2
    Printer        2
    Office         2
    Laser          2
    Video          1
    Refrigerator   1
    Player         1
    MP             1
    HD             1
    Camcorder      1
    Audio          1
    Appliance      1

    正好在学习python,顺便拿python实现一下吧,代码如下:

    #!/usr/bin/env python
    import sys,re
    
    if len(sys.argv[0:]) != 2:
        print "Usage:%s file" % sys.argv[0]
        sys.exit(0)
    
    try:
        filename=sys.argv[1]
        with open(filename) as f:
            data=f.read()
    except IOError:
        print "Please check %s is Exised!" % filename
        exit(0)
    except Exception,e:
        print e
        sys.exit()
    
    patten=r'[a-zA-Z]+'
    words=re.findall(patten,data)
    #print sorted([(i,words.count(i)) for i in set(words)],cmp=lambda x,y:cmp(x[1],y[1]),reverse=True)
    wordcounts=sorted([(i,words.count(i)) for i in set(words)],key=lambda x:x[1],reverse=True)
    print "%-14s %s" % ("Word","Counts")
    for word,counts in wordcounts:
        print "%-14s %s" % (word,counts)

    使用的也是正则先匹配出来后,再用sorted进行排序并计算出来个数,结果如下:

    [root@localhost shellcookbook]# python word_freq_py.py item.txt 
    Word           Counts
    Printer        2
    Laser          2
    Office         2
    Tennis         2
    Sports         2
    Racket         2
    Appliance      1
    Player         1
    Video          1
    HD             1
    Audio          1
    Camcorder      1
    Refrigerator   1
    MP             1
    View Code

    我们来看看这二个对比,程序效率如何:

    # time sh word_freq.sh item.txt 
    
    real    0m0.007s
    user    0m0.003s
    sys     0m0.005s
    time python word_freq_py.py item.txt 
    
    real    0m0.035s
    user    0m0.031s
    sys     0m0.004s

    对比来看,shell程序更快,主要是使用了awk提高了效率。所以在linux下写的小程序时,shell能实现,还是使用shell实现,python辅助。

  • 相关阅读:
    如何简化你的Vuex Store
    深入理解React中的setState
    vue双向绑定原理分析
    vue递归组件:树形控件
    Vue 3.0 的 Composition API 尝鲜
    React Native 与 Flutter 的跨平台之争
    javascript 变量赋值和 参数传递
    setTimeout 和 throttle 那些事儿
    一道面试题-变量声明提升~
    匹配文件扩展名两种方式
  • 原文地址:https://www.cnblogs.com/landhu/p/5170783.html
Copyright © 2011-2022 走看看