zoukankan      html  css  js  c++  java
  • pyspark 随机森林特征重要性

    # IMPORT
    >>> import numpy
    >>> from numpy import allclose
    >>> from pyspark.ml.linalg import Vectors
    >>> from pyspark.ml.feature import StringIndexer
    >>> from pyspark.ml.classification import RandomForestClassifier
    
    # PREPARE DATA
    >>> df = spark.createDataFrame([
    ...     (1.0, Vectors.dense(1.0)),
    ...     (0.0, Vectors.sparse(1, [], []))], ["label", "features"])
    >>> stringIndexer = StringIndexer(inputCol="label", outputCol="indexed")
    >>> si_model = stringIndexer.fit(df)
    >>> td = si_model.transform(df)
    
    # BUILD THE MODEL
    >>> rf = RandomForestClassifier(numTrees=3, maxDepth=2, labelCol="indexed", seed=42)
    >>> model = rf.fit(td)
    
    # FEATURE IMPORTANCES
    >>> model.featureImportances
    SparseVector(1, {0: 1.0}) 
    

      

    重要性:

    model.featureImportances

    pyspark 模型简单实例:

     https://blog.csdn.net/Katherine_hsr/article/details/80988994

    概率:

    predictions.select("probability", "label").show(1000)

    probability--->即为输出概率

    pandas 打乱样本:

    import pandas as pd
    df = pd.read_excel("window regulator01 _0914新增样本.xlsx")
    df = df.sample(frac = 1) #打乱样本

    pyspark train、test 随机划分

     train, test = labeled_v.randomSplit([0.75, 0.25])


  • 相关阅读:
    自定义button
    图片拉伸
    通过偏好设置进行数据存储
    AppDelegate中的方法解析
    copy-mutableCopy
    NSNumber、NSValue、NSDate、NSObject
    iOS OC语言原生开发的IM模块--RChat
    文件缓存
    ios基础动画、关键帧动画、动画组、转场动画等
    Moya/RxSwift/ObjectMapper/Alamofire开发
  • 原文地址:https://www.cnblogs.com/Allen-rg/p/10445893.html
Copyright © 2011-2022 走看看