【阅读笔记】Ranking Relevance in Yahoo Search （二）—— maching learned ranking

zoukankan html css js c++ java

【阅读笔记】Ranking Relevance in Yahoo Search （二）—— maching learned ranking
3. MACHINE LEARNED RANKING

1）完全使用不好的数据去训练模型不可行，因为负面结果不可能覆盖到所有方面；

2）搜索可以看做是个二分问题，在此实验中，我们使用gradient boosting trees(GBDT) with logistic loss，该方法可以用来减少首页出现的bad urls -

该方法首先确定urls与给定query相关与否的分界线（logistic loss）；

而后在模型中加入Perfect、Excellent、Good的信息去区分urls（GBDT）；

3.1 Core Ranking（相当于chinaso中booster的功能）

使用GBDT和logistic loss；

3.1.1 logistic loss：采用二分思想，用来减少首页出现的bad/fair urls

1）步骤：
- 按标签分等级：Perfect、Excellent、Good：+1；Fair、Bad：-1
- 公式：待加
2）优点

logistic loss相对于其他种类的loss函数（如hinge loss）更能提供靠谱的排序

因为：logistic loss always places the force on positive/negative towards positive/negative infinite；

3.1.2 GBDT 用来区分Perfect、Excellent、Good

1）步骤：
- 使用different levels区分Perfect、Excellent、Good（使Perfect data samples get relatively higher forces to positive infinite than Excellent ones, which are higher than the Good ones）
- 公式：待加
备注：其中scale(label)可以按经验设置为scale(Perfect)=3, scale(Excellent)=2, scale(Good/Fair/bad)=1以用来区分Perfect / Excellent / Good；

2）对于Fair / Bad samples，因为他们的分数始终为负值，所以没有必要为他们分等级；

3.1.3 评估分析（name this learning algorighm: LogisticRank）

compare with GBRank, LambdaMar

1）前期准备：

数据 - 200万query-url配对；

2）结果&分析

图表待加；

3.2 Contextual Reranking（相当于chinaso中tuner的功能）

1）reranking的执行时机：
- core ranking仅仅考虑了query-url配对的特征，而忽略了其他contextual information（因为在进行core ranking的时候，数据量太大）；
- reranking解析适用于从core ranking得到的大约几十个结果在一台机器上的排序操作（因为数据少所以可以利用模型中的重要特征进行提取）；
2）在tens of results中提取的特征：
- Rank: soring URLs by the feature value in ascending order to get the ranks of specific URLs
- Mean: calculating the mean of the feature values of top 30 URLs
- Variance: .... the variance of ...
- Normalized feature（特征归一化）: normalizing the feature by using mean and standard deviation
- Topic model feature: aggregating the topical distributions of 30 URLs to create a query topic model vector, and calculating similarity with each individual result
3.3 Implementation and deployment

core ranking的部署相当于chinaso中的leaf

reranking的部署相当于chinaso中的searchroot
查看全文

相关阅读:
springboot中properties|yml文件属性注入方式，@ConfigurationProperties、@Value("${value.name}")；@Configuration 和 @Component 区别
 ssh 为 Secure Shell 的缩写，远程登录会话；scp/secure copy，远程文件拷贝
 多个@Configuration配置文件，引用方式。多配置文件引用时提示could not autowired时，没有扫描到注解所在的包。springboot中，ContextLoader.getCurrentWebApplicationContext()获取的为Null
org.springframework.web.HttpMediaTypeNotSupportedException: Content type 'multipart/form-data;boundary;charset=UTF-8' not supported
eclipse 开发环境配置。工具栏创建包、类、接口快捷方式、代码提示、maven设置、控制台日志限制、背景色等
 @order 指定初始化加载顺序，注入接口的List集合，按顺序排列初始化的bean
windows自写bat命令文件，一键启动redis
VMware centos7 安装与环境配置、docker要求centos版本不低于7
jpa自定义查询Map、List<Map>转对象处理；bean对象与map、对象集合与map属性集合互转
 Python学习笔记——文件

原文地址：https://www.cnblogs.com/tanfy/p/8350827.html