zoukankan      html  css  js  c++  java
  • spark-sklearn(spark扩展scikitlearn)

    (1)官方规定安装条件:此包装具有以下要求:

    -*最新版本的scikit学习。 版本0.17已经过测试,旧版本也可以使用。
    - *Spark> = 2.0。 Spark可以从对应官网下载
    [Spark官方网站](http://spark.apache.org/)

    -*为了使用spark-sklearn,您需要使用pyspark解释器或其他Spark兼容的python解释器。

    有关详细信息,请参阅[Spark指南](https://spark.apache.org/docs/latest/programming-guide.html#overview)。
    - (https://nose.readthedocs.org)(仅测试依赖关系)

    英文原文:This package has the following requirements:
    - a recent version of scikit-learn. Version 0.17 has been tested, older versions may work too.
    - Spark >= 2.0. Spark may be downloaded from the
    [Spark official website](http://spark.apache.org/) In order to use spark-sklearn, you need to use the pyspark interpreter or another Spark-compliant python interpreter. See the [Spark guide](https://spark.apache.org/docs/latest/programming-guide.html#overview) for more details.
    - [nose](https://nose.readthedocs.org) (testing dependency only)

    (2)首先安装pyspark:

    参考为的博客:http://www.cnblogs.com/jackchen-Net/p/6667205.html#_label5

    (3)访问网址:https://pypi.python.org/pypi/spark-sklearn

    目前Spark集成了Scikit-learn包,这样可以极大的简化了python数据科学家们的工作,这个包可以在Spark集群上自动分配模型参数优化计算任务

     (4)官方文档的例子测试

     1 ## Example
     2 
     3 Here is a simple example that runs a grid search with Spark. See the [Installation](#Installation) section on how to install spark-sklearn.
     4 
     5 ```python
     6 from sklearn import svm, grid_search, datasets
     7 from spark_sklearn import GridSearchCV
     8 iris = datasets.load_iris()
     9 parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
    10 svr = svm.SVC()
    11 clf = GridSearchCV(sc, svr, parameters)
    12 clf.fit(iris.data, iris.target)
    13 ```
    14 
    15 This classifier can be used as a drop-in replacement for any scikit-learn classifier, with the same API.

    END~

  • 相关阅读:
    每周总结⑤
    每周总结④——所遇问题
    Leetcode566. 重塑矩阵
    移动应用开发三种方式
    html5离线存储manifest
    拓端tecdat|python中的copula:Frank、Clayton和Gumbel copula模型估计与可视化
    拓端tecdat|R语言用极大似然和梯度下降算法估计GARCH(p)过程
    拓端tecdat|R语言Keras用RNN、双向RNNs递归神经网络、LSTM分析预测温度时间序列、 IMDB电影评分情感
    JAVA中CountDownLatch的简单示例
    网络编程基础篇
  • 原文地址:https://www.cnblogs.com/jackchen-Net/p/7297555.html
Copyright © 2011-2022 走看看