zoukankan      html  css  js  c++  java
  • pyspark 随机森林特征重要性

    # IMPORT
    >>> import numpy
    >>> from numpy import allclose
    >>> from pyspark.ml.linalg import Vectors
    >>> from pyspark.ml.feature import StringIndexer
    >>> from pyspark.ml.classification import RandomForestClassifier
    
    # PREPARE DATA
    >>> df = spark.createDataFrame([
    ...     (1.0, Vectors.dense(1.0)),
    ...     (0.0, Vectors.sparse(1, [], []))], ["label", "features"])
    >>> stringIndexer = StringIndexer(inputCol="label", outputCol="indexed")
    >>> si_model = stringIndexer.fit(df)
    >>> td = si_model.transform(df)
    
    # BUILD THE MODEL
    >>> rf = RandomForestClassifier(numTrees=3, maxDepth=2, labelCol="indexed", seed=42)
    >>> model = rf.fit(td)
    
    # FEATURE IMPORTANCES
    >>> model.featureImportances
    SparseVector(1, {0: 1.0}) 
    

      

    重要性:

    model.featureImportances

    pyspark 模型简单实例:

     https://blog.csdn.net/Katherine_hsr/article/details/80988994

    概率:

    predictions.select("probability", "label").show(1000)

    probability--->即为输出概率

    pandas 打乱样本:

    import pandas as pd
    df = pd.read_excel("window regulator01 _0914新增样本.xlsx")
    df = df.sample(frac = 1) #打乱样本

    pyspark train、test 随机划分

     train, test = labeled_v.randomSplit([0.75, 0.25])


  • 相关阅读:
    UVa 116 单向TSP(多段图最短路)
    POJ 1328 Radar Installation(贪心)
    POJ 1260 Pearls
    POJ 1836 Alignment
    POJ 3267 The Cow Lexicon
    UVa 1620 懒惰的苏珊(逆序数)
    POJ 1018 Communication System(DP)
    UVa 1347 旅行
    UVa 437 巴比伦塔
    UVa 1025 城市里的间谍
  • 原文地址:https://www.cnblogs.com/Allen-rg/p/10445893.html
Copyright © 2011-2022 走看看