今天更新了电脑上的spark环境,因为上次运行新的流水线的时候,有的一些包在1.6.1中并不支持
只需要更改系统中用户的环境变量即可
然后在eclipse中新建pydev工程,执行环境是python3这里面关联的三个旧的库也换掉,最后eclipse环境变量换掉


随后开始看新的文档
这次是聚类的学习
1. K-mean
MLlib实现了这个算法的并行版本k-mean++方法,称为kmean||
这个算法是一个Estimator
输入:featuresCol
输出:predictionCol
执行示例代码的 时候
遇到一个错误:
Relative path in absolute URI
意思是相对路径出现在了绝对的统一资源定位符中
根据下面的参考:
在构建SparkSession的时候,多传递一个一个路径参数的设置spark.sql.warehouse.dir
因为
pyspark.sql.utils.IllegalArgumentException: 'java.net.URISyntaxException: Relati
ve path in absolute URI: file:D:/software/spark-2.0.0-bin-hadoop2.7/examples/src
/main/python/ml/spark-warehouse' 实际是读取当前路径下的spark.sql.warehouse.dir
这个设置应该是直接把这个做成了绝对路径
然后还需要把整个的data文件夹拷贝到当前的ml文件夹下
这样示例程序中原始的相对路径不用再修改了
因为我发现用../并不能从当前执行路径跳转到设置的data路径
from __future__ import print_function
# $example on$
from pyspark.ml.clustering import KMeans
# $example off$
from pyspark.sql import SparkSession
from pyspark.tests import SPARK_HOME
"""
An example demonstrating k-means clustering.
Run with:
bin/spark-submit examples/src/main/python/ml/kmeans_example.py
This example requires NumPy (http://www.numpy.org/).
"""
if __name__ == "__main__":
spark = SparkSession
.builder
.appName("PythonKMeansExample")
.config('spark.sql.warehouse.dir','file:///D:/software/spark-2.0.0-bin-hadoop2.7')
.getOrCreate()
# $example on$
# Loads data.
# 需要将data文件夹拷贝到当前的执行路径也就是ml文件夹下
dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt")
# Trains a k-means model.
kmeans = KMeans().setK(2).setSeed(1)
model = kmeans.fit(dataset)
# Evaluate clustering by computing Within Set Sum of Squared Errors.
wssse = model.computeCost(dataset)
print("Within Set Sum of Squared Errors = " + str(wssse))
# Shows the result.
centers = model.clusterCenters()
print("Cluster Centers: ")
for center in centers:
print(center)
# $example off$
spark.stop()
'''
sample_kmeans_data.txt
0 1:0.0 2:0.0 3:0.0
1 1:0.1 2:0.1 3:0.1
2 1:0.2 2:0.2 3:0.2
3 1:9.0 2:9.0 3:9.0
4 1:9.1 2:9.1 3:9.1
5 1:9.2 2:9.2 3:9.2
'''
'''
Within Set Sum of Squared Errors = 0.11999999999994547
Cluster Centers:
[ 0.1 0.1 0.1]
[ 9.1 9.1 9.1]
'''