zoukankan html css js c++ java

coursera课程Text Retrieval and Search Engines之Week 3 Overview

Week 3

On this page:

Instructional Activities

Below is a list of the activities and assignments available to you this week. See the How to Pass the Class page to know which assignments pertain to the badge or badges you are pursuing. Click on the name of each activity for more detailed instructions.

Relevant Badges	Activity	Due Date*	Estimated Time Required
	Week 3 Video Lectures	Sunday, April 12 (suggested)	3 hours
	Week 3 Quiz	Sunday, April 19	~0.5 hours

* All deadlines are at 11:55 PM Central Time (time zone conversion) unless otherwise noted.

Time

This module will last 7 days, and it should take approximately 6 hours of dedicated time to complete its readings and assignments.

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Explain how to interpret p(R=1|q,d), and estimate it based on a large set of collected relevance judgments (or clickthrough information) about query q and document d.
Explain how to interpret the conditional probability p(q|d) used for scoring documents in the query likelihood retrieval function.
Explain Statistical Language Model and Unigram Language Model.
Explain how to compute the maximum likelihood estimate of a Unigram Language Model.
Explain how to use Unigram Language Models to discover semantically related words.
Compute p(q|d) based on a given document language model p(w|d).
Explain smoothing.
Show that query likelihood retrieval function implements TF-IDF weighting if we smooth the document language model p(w|d) using the collection language model p(w|C) as a reference language model.
Compute the estimate of p(w|d) using Jelinek-Mercer (JM) smoothing and Dirichlet Prior smoothing, respectively.
Explain the similarity and differences in the three different kinds of feedback: relevance feedback, pseudo-relevance feedback, and implicit feedback.
Explain how the Rocchio feedback algorithm works.
Explain how the Kullback-Leibler (KL) divergence retrieval function generalizes the query likelihood retrieval function.
Explain the basic idea of using a mixture model for feedback.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

p(R=1|q,d) ; query likelihood, p(q|d)
Statistical Language Model; Unigram Language Model
Maximum likelihood estimate
Background language model, collection language model, document language model
Smoothing of Unigram Language Models
Relation between query likelihood and TF-IDF weighting
Linear interpolation (i.e., Jelinek-Mercer) smoothing
Dirichlet Prior smoothing
Relevance feedback, pseudo-relevance feedback, implicit feedback
Rocchio
Kullback-Leiber divergence (KL-divergence) retrieval function
Mixture language model

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Given a table of relevance judgments in the form of three columns (query, document, and binary relevance judgments), how can we estimate p(R=1|q,d)?
How should we interpret the query likelihood conditional probability p(q|d)?
What is a Statistical Language Model? What is a Unigram Language Model? How many parameters are there in a unigram language model?
How do we compute the maximum likelihood estimate of the Unigram Language Model (based on a text sample)?
What is a background language model? What is a collection language model? What is a document language model?
Why do we need to smooth a document language model in the query likelihood retrieval model? What would happen if we don’t do smoothing?
When we smooth a document language model using a collection language model as a reference language model, what is the probability assigned to an unseen word in a document?
How can we prove that the query likelihood retrieval function implements TF-IDF weighting if we use a collection language model smoothing?
How does linear interpolation (Jelinek-Mercer) smoothing work? What is the formula?
How does Dirichlet Prior smoothing work? What is the formula?
What are the similarity and difference between Jelinek-Mercer smoothing and Dirichlet Prior smoothing?
What is relevance feedback? What is pseudo-relevance feedback? What is implicit feedback?
How does Rocchio work? Why do we need to ensure that the original query terms have sufficiently large weights in feedback?
What is the KL-divergence retrieval function? How is it related to the query likelihood retrieval function?
What is the basic idea of the two-component mixture model for feedback?

Readings & Resources

Read ONLY Chapter 3 and part of Chapter 5 (pages 55–63)

Zhai, ChengXiang. Statistical Language Models for Information Retrieval. Synthesis Lectures Series on Human Language Technologies. Morgan & Claypool Publishers, 2008.

Video Lectures

Video Lecture	Lecture Notes	Transcript	Video Download	SRT Caption File	Forum
3.1 Probabilistic Retrieval Model: Basic Idea(00:12:44)			(17.1 MB)
3.2 Probabilistic Retrieval Model: Statistical Language Model (00:17:53)			(24.3 MB)
3.3 Probabilistic Retrieval Model: Query Likelihood (00:12:07)			(16.2 MB)
3.4 Probabilistic Retrieval Model: Statistical Language Model – Part 1 (00:12:15)			(16.5 MB)
3.4 Probabilistic Retrieval Model: Statistical Language Model – Part 2 (00:09:36)			(13.5 MB)
3.5 Probabilistic Retrieval Model: Smoothing Methods – Part 1 (00:09:54)			(14.5 MB)
3.5 Probabilistic Retrieval Model: Smoothing Methods – Part 2 (00:13:17)			(18.4 MB)
3.6 Retrieval Methods: Feedback in Text Retrieval (00:06:49)			(9.6 MB)
3.7 Feedback in Text Retrieval: Feedback in VSM (00:12:05)			(16.7 MB)
3.8 Feedback in Text Retrieval: Feedback in LM (00:19:11)			(26.4 MB)

Tips for Success

To do well this week, I recommend that you do the following:

Review the video lectures a number of times to gain a solid understanding of the key questions and concepts introduced this week.
When possible, provide tips and suggestions to your peers in this class. As a learning community, we can help each other learn and grow. One way of doing this is by helping to address the questions that your peers pose. By engaging with each other, we’ll all learn better.
It’s always a good idea to refer to the video lectures and chapter readings we've read during this week and reference them in your responses. When appropriate, critique the information presented.
Take notes while you read the materials and watch the lectures for this week. By taking notes, you are interacting with the material and will find that it is easier to remember and to understand. With your notes, you’ll also find that it’s easier to complete your assignments. So, go ahead, do yourself a favor; take some notes!

Getting and Giving Help

You can get/give help via the following means:

Use the Learner Help Center to find information regarding specific technical problems. For example, technical problems would include error messages, difficulty submitting assignments, or problems with video playback. You can access the Help Center by clicking on theHelp Center link at the top right of any course page. If you cannot find an answer in the documentation, you can also report your problem to the Coursera staff by clicking on the Contact Us! link available on each topic's page within the Learner Help Center.
Use the Content Issues forum to report errors in lecture video content, assignment questions and answers, assignment grading, text and links on course pages, or the content of other course materials. University of Illinois staff and Community TAs will monitor this forum and respond to issues.

As a reminder, the instructor is not able to answer emails sent directly to his account. Rather, all questions should be reported as described above.

from: https://class.coursera.org/textretrieval-001/wiki/Week3Overview

查看全文

相关阅读:
javascript实现俄罗斯方块游戏
 HTML5 SSE 数据推送应用开发
 一次实习生面试经历
 前端工作面试问题（上）
关于写好这个“简历”的几点思考
 ROS机器人的系统构建-连接摄像头、连接kinect、连接激光雷达
 opencv 轮廓的外围多边形提取或者删除最小最大轮廓
 opencv 轮廓点的坐标大小的修改
 opencv 轮廓的外围多边形提取或者删除最小最大轮廓
 opencv 容器的使用vector<std::vector<cv::Point>> or 轮廓存储到容器中

原文地址：https://www.cnblogs.com/GarfieldEr007/p/5165440.html