最近在看shell中有个题目为统计单词的个数,使用了awk功能,代码如下
#!/bin/bash if [ $# -ne 1 ];then echo "Usage:basename $0 filename" exit 1 fi filename=$1 egrep -o "[a-zA-Z]+" $filename | awk '{count[$0]++} END{printf "%-14s %s ","Word","Count" for(i in count)printf "%-14s %s ",i,count[i]|"sort -nrk 2"}'
使用正则来匹配,+表示1个多个
结果如下:
[root@localhost shellcookbook]# sh word_freq.sh item.txt Word Count Tennis 2 Sports 2 Racket 2 Printer 2 Office 2 Laser 2 Video 1 Refrigerator 1 Player 1 MP 1 HD 1 Camcorder 1 Audio 1 Appliance 1
正好在学习python,顺便拿python实现一下吧,代码如下:
#!/usr/bin/env python import sys,re if len(sys.argv[0:]) != 2: print "Usage:%s file" % sys.argv[0] sys.exit(0) try: filename=sys.argv[1] with open(filename) as f: data=f.read() except IOError: print "Please check %s is Exised!" % filename exit(0) except Exception,e: print e sys.exit() patten=r'[a-zA-Z]+' words=re.findall(patten,data) #print sorted([(i,words.count(i)) for i in set(words)],cmp=lambda x,y:cmp(x[1],y[1]),reverse=True) wordcounts=sorted([(i,words.count(i)) for i in set(words)],key=lambda x:x[1],reverse=True) print "%-14s %s" % ("Word","Counts") for word,counts in wordcounts: print "%-14s %s" % (word,counts)
使用的也是正则先匹配出来后,再用sorted进行排序并计算出来个数,结果如下:

[root@localhost shellcookbook]# python word_freq_py.py item.txt Word Counts Printer 2 Laser 2 Office 2 Tennis 2 Sports 2 Racket 2 Appliance 1 Player 1 Video 1 HD 1 Audio 1 Camcorder 1 Refrigerator 1 MP 1
我们来看看这二个对比,程序效率如何:
# time sh word_freq.sh item.txt real 0m0.007s user 0m0.003s sys 0m0.005s
time python word_freq_py.py item.txt real 0m0.035s user 0m0.031s sys 0m0.004s
对比来看,shell程序更快,主要是使用了awk提高了效率。所以在linux下写的小程序时,shell能实现,还是使用shell实现,python辅助。