Exercise 8: TF/IDF ranking

zoukankan html css js c++ java

Exercise 8: TF/IDF ranking
Exercise 8 - TF/IDF ranking
DIS 2006/2007
Exercise 8: TF/IDF ranking
In this exercise we'll have a look at how the TF/IDF ranking works.
There are 5 different documents in the collection:
D1 = "If it walks like a duck and quacks like a duck, it must be a duck."
D2 = "Beijing Duck is mostly prized for the thin, crispy duck skin with authentic versions of the dish serving mostly the skin."
D3 = "Bugs' ascension to stardom also prompted the Warner animators to recast Daffy Duck as the rabbit's rival, intensely jealous and determined to steal back the spotlight while Bugs remained indifferent to the duck's jealousy, or used it to his advantage. This turned out to be the recipe for the success of the duo."
D4 = "6:25 PM 1/7/2007 blog entry: I found this great recipe for Rabbit Braised in Wine on cookingforengineers.com."
D5 = "Last week Li has shown you how to make the Sechuan duck. Today we'll be making Chinese dumplings (Jiaozi), a popular dish that I had a chance to try last summer in Beijing. There are many recipies for Jiaozi."
Task 1. For the query Q = "Beijing duck recipe", find the two top ranked documents according to the TF/IDF rank. Assume the cosine similarity measure and the culinary term set T = {beijing, dish,duck, rabbit, recipe, roast}. Are the top ranked documents relevant to the query?
Task 2. Assume that the author of the document D5 goes on to tell more about her summer trip to China before doing the cooking and uses the word Beijing 3 times, instead of just once. What happens to the rank of D5? How can this be interpreted in the vector retrieval model (vectors and angles between them)? Is this change in the ranking of D5 a desirable property of TF/IDF? Why?
Solution

Excel sheet with calculations
查看全文

相关阅读:
2016.08.13/2/index/_d_Lucene54_0.dvm: Too many open files
/usr/lib64/python2.6/site-packages/pycurl.so: undefined symbol: CRYPTO_set_locking_callback
rsyslog 读取单个文件测试
 注意:rsyslog 源码安装会出现日志重复发的情况,需要rpm包安装
 客户端把rsyslog重启,就会发送全部日志 --待研究
 rsyslog 一重启就会开始同步之前所有通配的日志文件
 rsyslog 只读取变化的日志
 响应头location 页面跳转
 8.1 Optimization Overview
golang 建临时文件目录以及删除

原文地址：https://www.cnblogs.com/lexus/p/2698677.html