用 Pipe 搞定单词统计的面试题

zoukankan html css js c++ java

用 Pipe 搞定单词统计的面试题

赖勇浩（http://laiyonghao.com）

今天早上，@smallfishxy 公开了一条面试题：

读取文件，统计文件中每个单词出现的次数，然后按照次数高低排序。

本来蛮平淡无奇的一题，但一跟前天介绍的 Pipe 结合起来，就有意思了，这类数据流的处理，相当适合用 Pipe 来处理，花了点时间，写代码如下：

from __future__ import print_function from re import split from pipe import * with open('test_descriptor.py') as f: print(f.read() | Pipe(lambda x:split('/W+', x)) | Pipe(lambda x:(i for i in x if i.strip())) | groupby(lambda x:x) | select(lambda x:(x[0], (x[1] | count))) | sort(key=lambda x:x[1], reverse=True) )

输出：

[('self', 13), ('foo', 9), ('item', 9), ('_data', 8), ('print', 7), ('def', 5), ('return', 5), ('Jeff', 4), ('i', 4), ('in', 4), ('jeff', 4), ('ken', 4), ('obj', 4), ('val', 4), ('class', 3), ('lai', 3), ('pan', 3), ('tmp', 3), ('Foo', 2), ('ItemDescriptor', 2), ('Wrapper', 2), ('__iter__', 2), ('for', 2), ('if', 2), ('next', 2), ('object', 2), ('0', 1), ('1', 1), ('30', 1), ('8', 1), ('None', 1), ('__class__', 1), ('__future__', 1), ('__get__', 1), ('__init__', 1), ('__set__', 1), ('bin', 1), ('coding', 1), ('env', 1), ('f', 1), ('from', 1), ('import', 1), ('instance', 1), ('isinstance', 1), ('len', 1), ('list', 1), ('print_function', 1), ('python', 1), ('type', 1), ('usr', 1), ('utf', 1)]

在使用 Pipe 解题的过程中，发现一个问题：

当出错的时候，想找到错误原因太难了！

查看全文

相关阅读:
二进制流最后一段数据是最后一次读取的byte数组没填满造成的
 java中的匿名内部类总结
 决策树构建算法之—C4.5
Segment公司--整合数据进行分析
 UBuntu安裝使用PIP
undefined reference to “boost” in Qt—Ubuntu
Ubuntu14.04引导菜单修复
 ubuntu16.04下编译安装OpenCV
PCL：Ubuntu下安装配置PCL
Ubuntu安装配置Python.pyDev

原文地址：https://www.cnblogs.com/aiwz/p/6154351.html