zoukankan      html  css  js  c++  java
  • pyspark 随机森林特征重要性

    # IMPORT
    >>> import numpy
    >>> from numpy import allclose
    >>> from pyspark.ml.linalg import Vectors
    >>> from pyspark.ml.feature import StringIndexer
    >>> from pyspark.ml.classification import RandomForestClassifier
    
    # PREPARE DATA
    >>> df = spark.createDataFrame([
    ...     (1.0, Vectors.dense(1.0)),
    ...     (0.0, Vectors.sparse(1, [], []))], ["label", "features"])
    >>> stringIndexer = StringIndexer(inputCol="label", outputCol="indexed")
    >>> si_model = stringIndexer.fit(df)
    >>> td = si_model.transform(df)
    
    # BUILD THE MODEL
    >>> rf = RandomForestClassifier(numTrees=3, maxDepth=2, labelCol="indexed", seed=42)
    >>> model = rf.fit(td)
    
    # FEATURE IMPORTANCES
    >>> model.featureImportances
    SparseVector(1, {0: 1.0}) 
    

      

    重要性:

    model.featureImportances

    pyspark 模型简单实例:

     https://blog.csdn.net/Katherine_hsr/article/details/80988994

    概率:

    predictions.select("probability", "label").show(1000)

    probability--->即为输出概率

    pandas 打乱样本:

    import pandas as pd
    df = pd.read_excel("window regulator01 _0914新增样本.xlsx")
    df = df.sample(frac = 1) #打乱样本

    pyspark train、test 随机划分

     train, test = labeled_v.randomSplit([0.75, 0.25])


  • 相关阅读:
    西交应用统计学(四)
    SPSS非参数检验
    并查集实现
    二叉树遍历非递归算法
    算法导论——渐近符号、递归及解法
    SPSS均值过程和T检验
    二维数组的查找及向函数传递二维数组问题
    printf()的格式
    C++ string类型的读写
    替换字符串中的空格
  • 原文地址:https://www.cnblogs.com/Allen-rg/p/10445893.html
Copyright © 2011-2022 走看看