zoukankan      html  css  js  c++  java
  • k-means 聚类前的数据分析

    原始数据

    Say you are given a data set where each observed example has a set of features, but has nolabels. Labels are an essential ingredient to a supervised algorithm like Support Vector Machines, which learns a hypothesis function to predict labels given features. So we can't run supervised learning. What can we do?

    One of the most straightforward tasks we can perform on a data set without labels is to find groups of data in our dataset which are similar to one another -- what we call clusters.


    #!/usr/bin/python

    import matplotlib.pyplot as plt

    def readfile(filename):
    datamat = []
    with open(filename, 'r') as f:
    for line in f.readlines():
    linestrlist = line.strip().split(' ')
    linelist = list(map(float, linestrlist))
    datamat.append(linelist)

    return datamat

    if __name__ == "__main__":
    datamat = []
    datamat = readfile("C:\kmeans.txt")
    vectors_set = []
    for val in enumerate(datamat):
    vectors_set.append(val[1])
    x_data = [v[0] for v in vectors_set]
    y_data = [v[1] for v in vectors_set]
    plt.plot(x_data, y_data, 'r*', label='Original data')
    plt.legend()
    plt.show()
    K-means聚类时候,需要给定K的值,这个时候可以先画出图,大致判断一下。
  • 相关阅读:
    捕获组
    re.S解析
    Python eval 函数妙用
    Python tips: 什么是*args和**kwargs?
    HBase 的安装与配置
    HBase 基本操作
    HBase中的备份和故障恢复方法
    Hbase写数据,存数据,读数据的详细过程
    HBase shell
    HDFS的快照原理和Hbase基于快照的表修复
  • 原文地址:https://www.cnblogs.com/donggongdechen/p/10435266.html
Copyright © 2011-2022 走看看