zoukankan      html  css  js  c++  java
  • Spark2.0 Pipelines

    MLlib中众多机器学习算法API在单一管道或工作流中更容易相互结合起来使用。管道的思想主要是受到scikit-learn库的启发。 
    ML API使用Spark SQL中的DataFrame作为机器学习的数据集。DataFrame不同的列可以分别存储文本,特征向量,真实的Lables,和预测值。

      • Transformer:一个Transformer是一个算法,可以将一个DataFrame转换为另一个DataFrame。如将一个带特征值的DataFrame转换为带预测值的DataFrame。
      • Estimator:Estimator在一个DataFrame上完成Transformer转换过程。如一个学习算法就是一个Estimator,该Estimator应用在测试DataFrame上,完成模型的训练过程。
      • Pipelie:将多个Transformers和 Estimators 串在一起,以完成某个特定的机器学习工作流程。
      • 参数:全部Transformers和 Estimators 共享通用的API,以完成各自特定参数的设置。

    MLlib standardizes APIs for machine learning algorithms to make it 
    easier to combine multiple algorithms into a single pipeline, or 
    workflow. This section covers the key concepts introduced by the 
    Pipelines API, where the pipeline concept is mostly inspired by the 
    scikit-learn project.

    DataFrame: This ML API uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions. 
    Machine learning can be applied to a wide variety of data types, such as vectors, text, images, and structured data. This API adopts the DataFrame from Spark SQL in order to support a variety of data types.

    DataFrame supports many basic and structured types; see the Spark SQL datatype reference for a list of supported types. In addition to the types listed in the Spark SQL guide, DataFrame can use ML Vector types.

    A DataFrame can be created either implicitly or explicitly from a 
    regular RDD. See the code examples below and the Spark SQL programming guide for examples.

    Columns in a DataFrame are named. The code examples below use names such as “text,” “features,” and “label.”

    Transformer: A Transformer is an algorithm which can transform one 
    DataFrame into another DataFrame. E.g., an ML model is a Transformer 
    which transforms a DataFrame with features into a DataFrame with 
    predictions.

    Estimator: An Estimator is an algorithm which can be fit on a 
    DataFrame to produce a Transformer. E.g., a learning algorithm is an 
    Estimator which trains on a DataFrame and produces a model.

    Pipeline: A Pipeline chains multiple Transformers and Estimators 
    together to specify an ML workflow. Parameter: All Transformers and 
    Estimators now share a common API for specifying parameters.

  • 相关阅读:
    在TextBrowser显示中,如何让最新的数据永远出现在第一行或者是在窗口的最后显示信息
    在Qtlabel中显示数字十六进制和十进制都可以
    实现 在子界面的button按下,在主界面的label显示。
    今天在Qt子界面中的Button,转到槽转不过去,报错Qt The class containing 'Ui::MainWindow' could not be found in...
    Qt串口接收使用多个LCD控件显示不同的数据
    Qt图标自定义
    Qt绘制动态曲线
    3.3.2Qt的按钮部件
    Mesh Profile (5)启动配置(配网)
    SiliconLabs EFR32BG 定时器输入捕获和脉宽调制
  • 原文地址:https://www.cnblogs.com/itboys/p/8316033.html
Copyright © 2011-2022 走看看