http://www.wentrue.net/blog/?p=970
可能是史上代码最少的协同过滤推荐引擎
自世界杯开幕以来,这是首次看不到球赛的两天,看不了球,就写篇博客吧,标题比较有噱头,实际上是用R实现的item-based CF推荐算法。
除去注释,有效代码只有16行。其中大量运用了向量化的函数与处理方式,所以没有任何的显式循环结构,关于向量化更详细的叙述可看这里。
注:该代码实现的只是最基本算法,仅作参考,不承诺在大规模与复杂数据环境下的实用性。
源数据文件data.dat的内容如下所列:
user_id,subject_id1,11,31,71,132,22,52,62,72,92,102,113,13,23,33,43,73,93,105,136,16,36,46,56,86,108,18,28,38,58,68,78,89,1310,1211,211,311,411,611,811,911,1312,1213,313,613,715,415,1215,1316,216,316,416,716,817,217,317,417,517,617,717,817,917,1017,1118,218,319,219,319,519,619,919,1019,1119,1220,120,320,420,720,1321,121,621,821,921,1121,1221,1322,623,223,423,923,1224,124,524,925,225,625,1025,1126,226,326,827,327,627,1227,1328,128,228,328,528,728,928,1028,1128,1228,1329,129,229,329,429,529,629,729,829,929,1030,630,730,930,1331,631,1132,132,533,233,1334,334,734,834,934,1034,1335,335,435,535,635,736,236,336,436,636,736,836,936,1136,1236,1338,541,141,341,441,541,641,741,1142,242,342,742,842,942,1042,1143,243,643,1043,1143,12