用过spark,对wordcount这个演示程序记忆犹新,于是想试着实现一个简单的wordcount。又因为在学习函数式编程,希望可以把数据看成一个整体,在现有的函数上进行操作。于是就有了这一行代码。
这行代码包括对单词的粗略处理,包括全部转化为小写,去除标点符号等。接下来用filter去掉了空行,最后使用Counter进行计数,实在是很方便快捷啊。
1 import re 2 from collections import Counter 3 4 input = """As we know, the NTU Final PK contest usually tends to be pretty hard. Many teams got frustrated when 5 participating NTU Final PK contest. So I decide to make the first problem as "easy" as possible. But how 6 to know how easy is a problem? To make our life easier, we just consider how easy is a string.""" 7 8 ret = Counter(filter(lambda x: x != '',re.subn('W', ' ',input.lower())[0].split(' '))).items() 9 10 for i in ret: 11 print i[0], i[1]