zoukankan      html  css  js  c++  java
  • Paper Reading: A Brief Introduction to Weakly Supervised Learning

    incomplete, 想利用未标注数据帮助训练

    inexact, 笼统的数据标注,如垃圾邮件分类

    inaccurate supervision, 带噪声的数据,如众包

    Incomplete supervision

    training data set (D={(x_1,y_1),cdots,(x_l,y_l),x_{l+1},cdots,x_m})

    active learning (with human intervention)

    ​ the labeling cost only depends on the number of queries

    1. informativeness: an unlabeled instance helps reduce the uncertainty of a statistical model.

      1.1 Uncertainty sampling a single learner, with the least confidence

      1.2 query-by-committee multiple learners, disagree to most

    2. representativeness : an instance helps represent the structure of input patterns

      2.1 aim to exploit the cluster structure of unlabeled data

    semi-supervised learning (no human intervention is assumed)

    ​ Here, although the unlabeled data points are not explicitly with label information, they implicitly convey some information about data distribution which can be helpful for predictive modelling.

    ​ two basic assumptions: the cluster assumption (data have inherent cluster structure) and the manifold assumption (data lie on a manifold).

    1. generative methods

      ​ labels of unlabeled instances can be treated as missing values of model parameters, and estimated by approaches such as the EM .

      ​ To get good performance, one usually needs domain knowledge to determine adequate generative model.

    2. graph based methods

      ​ the performance will heavily depends on how the graphis constructed.

    3. low-density seperation methods

      ​ It is evident that S3VMs try to identify a classification boundary which goes across the less dense region while keeping the labeled data correctly classified.

    4. disagreement-based methods

      ​ generate multiple learners and let them collaborate to exploit unlabeled data.

    Inexact Supervision

    ​ Multi-instance learning: predict the labels for unseen bags((X_i) is a positive bag, if there exists (x_{ip}) which is positive, while p is unknown).

    Inaccurate Supervision

    ​ For machine learning, crowdsourcing is commonly used as a cost-saving way to collect labels for training data.

  • 相关阅读:
    ES6_函数方法
    Node_初步了解(3)回调,作用域,上下文
    ES6_入门(4)_数组的解构赋值
    ES6_入门(3)_顶层对象属性
    ES6_入门(2)_const命令
    ES6_入门(1)_let命令
    Node_初步了解(2)
    Node_初步了解(1)
    树的基本知识
    JS_高程5.引用类型(6)Array类型的位置方法,迭代方法,归并方法
  • 原文地址:https://www.cnblogs.com/blueprintf/p/8533759.html
Copyright © 2011-2022 走看看