zoukankan      html  css  js  c++  java
  • 【NLP新闻-2013.06.16】Representative Reviewing

    英语原文地址:http://nlp.hivefire.com/articles/share/40221/

    注:本人翻译NLP新闻只为学习专业英语和扩展视野,如果翻译的不好,请谅解!

    (实在是读不大懂,翻译的一塌糊涂…如果有人能明白这篇文章的大题意思,一定要留言,感激不尽!)

    When thinking about how best to review papers, it seems helpful to have some conception of what good reviewing is. As far as I can tell, this is almost always only discussed in the specific context of a paper (i.e. your rejected paper), or at most an area (i.e. what a “good paper” looks like for that area) rather than general principles. Neither individual papers or areas are sufficiently general for a large conference—every paper differs in the details, and what if you want to build a new area and/or cross areas?

    当考虑如何最好的去审查论文的时候,如果对什么是好的审查有一些概念和理解的话,那么是有帮助的。据我所知,这种情况只有在讨论一篇论文具体语境的时候(例如你拒绝的论文)出现或者在一个大多数的领域,而不是一般的规则。没有任何一个人或者一个领域的知识是足以应付一个大型会议的-每一个论文在细节上是不同的,要是你建立一个新的领域或者交叉的领域会怎么样呢?

    An unavoidable reason for reviewing is that the community of research is too large. In particular, it is not possible for a researcher to read every paper which someone thinks might be of interest. This reason for reviewing exists independent of constraints on rooms or scheduling formats of individual conferences. Indeed, history suggests that physical constraints are relatively meaningless over the long term — growing conferences simply use more rooms and/or change formats to accommodate the growth.

    一个不可避免的审查的原因是研究的团体太大了。尤其是,不可能每一个研究者阅读每一篇他感兴趣的论文。这个原因独立存在于房间的限制和个人会议调度安排。实际上,历史表明,物理上的限制在时代发展的前提下是毫无意义的,长期增长的会议仅仅简单的使用了更多的房间,或者改变了形式来适应增长。

    This suggests that a generic test for paper acceptance should be “Are there a significant number of people who will be interested?” This question could theoretically be answered by sending the paper to every person who might be interested and simply asking them. In practice, this would be an intractable use of people’s time: We must query far fewer people and achieve an approximate answer to this question. Our goal then should be minimizing the approximation error for some fixed amount of reviewing work.

    这表明,论文的一般测试验收应该是:“是否会有相当多的人感兴趣?”。这个问题可以理论上通过把这篇论文给每个可能感兴趣的人并且只询问他们是否感兴趣来回答。实际上,这将比较难管理的去使用别人的时间:我们必须查询更少的人并且获得大概的针对这个问题的回答。我们的目标应该在固定的审查工作中减少近似值误差。

    Viewed from this perspective, the first way that things can go wrong is by misassignment of reviewers to papers, for which there are two easy failure modes available.

    从这个角度看事情,第一种方式评审论文分配不当可能会出现错误,这里有两种容易失效的模式。

    1. When reviewer/paper assignment is automated based on an affinity graph, the affinity graph may be low quality or the constraint on the maximum number of papers per reviewer can easily leave some papers with low affinity to all reviewers orphaned.
    2. 当评审者/论文的分配根据亲和图自动分配,亲和图的质量也许很低或者每个人的论文数量的最大值的限制会很容易剩余一些论文,与评审者具有低的亲和力让他们孤立。
    3. When reviewer/paper assignments are done by one person, that person may choose reviewers who are all like-minded, simply because this is the crowd that they know. I’ve seen this happen at the beginning of the reviewing process, but the more insidious case is when it happens at the end, where people are pressed for time and low quality judgements can become common.
    4. 当评审者/论文分配是由一个人完成,这个人可能会选择志趣相投的评审者,因为这些是他们知道的。我已经看到了这种模式已经出现在评审进程当中,但是更多隐藏的事件发生在最后,在最后阶段人们压时间,低质量的评判,成为了常见的现象。

    An interesting approach for addressing the constraint objective would be optimizing a different objective, such as the product of affinities rather than the sum. I’ve seen no experimentation of this sort.

    一种有趣的方法解决约束目标可以为优化不同的目标,比如产品的亲和力而不是总和。我还没有看到过有人使用这种方法。

    For ICML, there are about 3 levels of “reviewer”: the program chair who is responsible for all papers, the area chair who is responsible for organizing reviewing on a subset of papers, and the program committee member/reviewer who has primary responsibility for reviewing. In 2012 tried to avoid these failure modes in a least-system effort way using a blended approach. We used bidding to get a higher quality affinity matrix. We used a constraint system to assign the first reviewer to each paper and two area chairs to each paper. Then, we asked each area chair to find one reviewer for each paper. This obviously dealt with the one-area-chair failure mode. It also helps substantially with low quality assignments from the constrained system since (a) the first reviewer chosen is typically higher quality than the last due to it being the least constrained (b) misassignments to area chairs are diagnosed at the beginning of the process by ACs trying to find reviewers (c) ACs can reach outside of the initial program committee to find reviewers, which existing automated systems can not do.

     

    (ICML Intermedia Casting Markup Language媒体选择标记语言)ICML,有三种水平的“评审者”:程序的主要负责者,负责所有的论文;区域负责者,负责组织审查论文的子集还有程序的委员会成员们,评审者们,有直接的评审的责任。在2012年试着去防止这些失效的模型在最小系统的工作方式下使用混合的工作方式。我们通过招标来获得更高质量的亲和矩阵。我们使用一个限制系统来分配第一个评审者给每一篇论文然后两个区域的负责者一篇分配给每一篇论文。然后,我们询问每一个区域的负责者去为每一个论文寻找到一个评审者。这很明显是一one-area-chair失效处理模式。他还从本质上有助于低质量的从限制系统分配第一个评审者选择,与最后一个相比通常会质量更高,因为它会变得最少的约束误配给区域负责者,这些负责者在进程开始会被ACs诊断,试着去发现评审者ACs可以去初始程序委员会的外面去寻找评审者,这些已经存在的自动系统是不能做到的。

    The next way that reviewing can go wrong is via biased reviewing.

    下一种方式的评审通过偏见评审可能会出现错误。

    1. Author name bias is a famous one. In my experience it is real: well known authors automatically have their paper taken seriously, which particularly matters when time is short. Furthermore, I’ve seen instances where well-known authors can slide by with proof sketches that no one fully understands.
    2. 作者姓名的偏见就是一个著名的例子。在我的经历中:著名的作家们自动的把他们的论文认真对待,特别是当时间很短的时候。此外,我也已经看见过著名的作家在梗概没有人能完全理解的情况下经过证明会下跌。
    3. Review anchoring is a very significant problem if it occurs. This does not happen in the standard review process, because the reviews of others are not visible to other reviewers until they are complete.
    4. 评审的稳定一旦发生是一个非常关键的问题。他在标准的评审程序中还没有出现,因为其他的评审直到他们完成相对于其他的评审来说是不可见的。
    5. A more subtle form of bias is when one reviewer is simply much louder or charismatic than others. Reviewing without an in-person meeting is actually helpful here, as it reduces this problem substantially.
    6. 一个更不易察觉的偏见的形式是当一个评审者仅仅更加高调的或者相比其他有魅力。没有一个人的会议的评审实际上是非常有效的,就好像充分的削弱了这个问题。

    Reviewing can also be low quality. A primary issue here is time: most reviewers will submit a review within a time constraint, but it may not be high quality due to limits on time. Minimizing average reviewer load is quite important here. Staggered deadlines for reviews are almost certainly also helpful. A more subtle thing is discouraging low quality submissions. My favored approach here is to publish all submissions nonanonymously after some initial period of time.

    评审也会变得质量低。一个重要的问题就是时间:大部分的评审者将会提交一个评审在时间的限制内,但是这样可能质量可能不会很高,就是因为时间的限制。减少平均的评审者的载入是非常重要的。错开的截至时间对于评审者来说是非常有帮助的。一个更加不易察觉的事情是发现低质量的提交。我最喜欢的方法是发布所有的提交上来的结果在一些初试时间过后。

    Another significant issue in reviewer quality is motivation. Making reviewers not anonymous to each other helps with motivation as poor reviews will at least be known to some. Author feedback also helps with motivation, as reviewers know that authors will be able to point out poor reviewing. It is easy to imagine that further improvements in reviewer motivation would be helpful.

    另一个关键的问题是,评审质量是动力。使每一个评审者不匿名的对于其他人会有助于动机,正如的不良的评论会至少被一些人知道。作者反馈也有助于动机,例如评论者知道作者将会指出不好的评论。也很容易想象会有更深层次的改善在评论者动机上。

    A third form of low quality review is based on miscommunication. Maybe there is silly typo in a paper? Maybe something was confusing? Being able to communicate with the author can greatly reduce ambiguities.

    第三种低质量的评审形式是错误传达。也许有人在文章中写了错字。也许一些事情是疑惑的。能够与作者联系上可以大大的减少歧义。

    The last problem is dictatorship at decision time for which I’ve seen several variants. Sometimes this comes in the form of giving each area chair a budget of papers to “champion”. Sometimes this comes in the form of an area chair deciding to override all reviews and either accept or more likely reject a paper. Sometimes this comes in the form of a program chair doing this as well. The power of dictatorship is often available, but it should not be used: the wiser course is keeping things representative.

    最后一个问题是我已经见过的一些变种,在决定的时刻独裁。有些时候出现在给每个领域预算“冠军”的论文。有的时候出现在一个领域的负责者裁决去覆盖所有的评论或者接受或者可能拒绝一个论文。有的时候出现在一个程序的负责者做这样的事情。独裁的力量是可以获得的,但是不能使用:比较明智的做法是保持事物的代表性。

    At ICML 2012, we tried to deal with this via a defined power approach. When reviewers agreed on the accept/reject decision, that was the decision. If the reviewers disgreed, we asked the two area chairs to make decisions and if they agreed, that was the decision. It was only when the ACs disagreed that the program chairs would become involved in the decision.

    在ICML2012,我们试着去通过一个定义的有作用的方法去处理这些问题。当评论者同意一个接受的或者拒绝的决定时,这种方法就是一个决定。如果评论者不同意,我们将会询问两个领域的负责人来做决定,如果他们同意了,那么这就是最终决定的结果。仅仅只有在ACs不同意的时候,程序负责人才会被加入到决定的判决当中来。

    The above provides an understanding of how to create a good reviewing process for a large conference. With this in mind, we can consider various proposals at the peer review workshop and elsewhere.

    上面提供了一个关于怎样去创建一个良好的评论进程在一个会议当中的理解。记住这一点,我们可以考虑各种不同的提议在同行评审的研讨会上或者一些其他的地方。

    1. Double Blind Review. This reduces bias, at the cost of decreasing reviewer motivation. Overall, I think it’s a significant long term positive for a conference as “insiders” naturally become more concerned with review quality and “outsiders” are more prone to submit.
    2. Double Blind Review.这种方式降低了偏见,以减少评论者的动机为代价。全面的看,我认为这对于会议来说是一种有意义的可以长期发展的方式,就像,知情人很自然的成为了更多关联评审质量,外部人容易有倾斜的去提交。
    3. Better paper/reviewer matching. A pure win, with the only caveat that you should be familiar with failure modes and watch out for them.
    4. Better paper/reviewer matching.一种纯粹的胜出,仅有的需要注意的是你必须熟悉失效模型并且对他们保持警觉。
    5. Author feedback. This improves review quality by placing a check on unfair reviews and reducing miscommunication at some cost in time.
    6. 作者反馈。这可以提高评论的质量,对于检查和定位一个不公平的评论和减少最后由于误传产生的代价。
    7. Allowing an appendix or ancillary materials. This allows authors to better communicate complex ideas, at the potential cost of reviewer time. A standard compromise is to make reading an appendix optional for reviewers.
    8. 允许附录的材料。作者可以更好的交流复杂的观点,可以更好的改善审稿时间的成本。一个标准的折中解决的方案是可以阅读可选的附录。
    9. Open reviews. Open reviews means that people can learn from other reviews, and that authors can respond more naturally than in single round author feedback.
    10. Open reviews.Open reviews的意思是人们可以从其他的评论中学习,并且作者可以更自然的回复而不是单轮的作者回馈。

    It’s important to note that none of the above are inherently contradictory. This is not necessarily obvious as proponents of open review and double blind review have found themselves in opposition at times. These approaches can be accommodated by simply hiding authors names for a fixed period of 2 months while the initial review process is ongoing.

    值得指出的是,上面提到的没有一种是相互矛盾的。也没有必要明显的分出open view的拥护者还是double blind review的拥护者,以达到对立。这些方法可以通过隐藏作者的名字固定的2个月的时间,当评论进程初始开始进行的时候来适应。

    Representative reviewing seems like the real difficult goal. If a paper is rejected in a representative reviewing process, then perhaps it is just not of sufficient interest. Similarly, if a paper is accepted, then perhaps it is of real and meaningful interest. And if the reviewing process is not representative, then perhaps we should fix the failure modes.

    Representative reviewing 看起来是一个非常困难的目标。如果一篇论文在Representative reviewing 的进程中被拒绝了,然后可能是不够有足够的兴趣。相似的,如果一个论文被接受了,然后可能这篇文章真的是非常的有意义和感兴趣。如果评审进程不是典型的,然后可能我们应该修改一下失效模型。

  • 相关阅读:
    第二百一十五节,jQuery EasyUI,DateBox(日期输入框)组件
    第二百一十四节,jQuery EasyUI,Calendar(日历)组件
    onethink 系统函数中 生成随机加密key
    本地开发 localhost链接数据库比127.0.0.1慢
    仿写thinkphp的I方法
    判断数组中有没有某个键 isset 和 array_key_exists 的效率比较
    jquery实时监听某个文本框的输入事件
    js数组去重
    thinkphp3.2.3 版本使用redis缓存的时候无法使用认证
    javascript中使用md5函数
  • 原文地址:https://www.cnblogs.com/createMoMo/p/3142755.html
Copyright © 2011-2022 走看看