1、Information
publication:EMNLP 2014
author:Jing Liu(在前一篇sigir基础上,拓展模型的论文)
2、What
衡量CQA中问题的困难程度,提出从两个方向建模
1)利用Competition的比较:Competition Model
q = {ua ≺q , q ≺ub , ua ≺ub , uo1 ≺ub , · · · , uoM ≺ub } ,
2) question Text Similarities for QDE,相似程度的问题具有相似的描述。(冷启动问题)
3、Dataset
Stack Overflow:
是一个与程序相关的IT技术问答网站。
数据下载地址:
http://www.ics.uci.edu/~duboisc/stackoverflow/
- qid: Unique question id
- i: User id of questioner
- qs: Score of the question
- qt: Time of the question (in epoch time)
- tags: a comma-separated list of the tags associated with the question. Examples of tags are ``html'', ``R'', ``mysql'', ``python'', and so on; often between two and six tags are used on each question.
- qvc: Number of views of this question (at the time of the datadump)
- qac: Number of answers for this question (at the time of the datadump)
- aid: Unique answer id
- j: User id of answerer
- as: Score of the answer
- at: Time of the answer
4、How
input: question user Competition,question-question的Competition,similarity.
output: pair compare result.
method:RCM
5、Evaluation:accuracy:ACC =# correctly judged question pairs/# all question pairs
baseline:pagerank,TS,CM
6、additional analysis
1)不同方式计算text similarity
2)estimate difficult sorce of cold start problem:KNN
3) 不同difficult level的text words 举例
7、conclusion