zoukankan      html  css  js  c++  java
  • k均值聚类

    k均值聚类

    1)从N个文档随机选取K个文档作为质心
    2)对剩余的每个文档测量其到每个质心的距离,并把它归到最近的质心的类
    3)重新计算已经得到的各个类的质心
    4)迭代2~3步直至新的质心与原质心相等或小于指定阈值,算法结束

    k均值聚类python代码实现:

    def kcluster(rows,distance=pearson,k=4):
      # Determine the minimum and maximum values for each point
      ranges=[(min([row[i] for row in rows]),max([row[i] for row in rows])) 
      for i in range(len(rows[0]))]
      print "ranges",ranges[0]
      print "ranges",ranges[1]
      # Create k randomly placed centroids
      clusters=[[random.random()*(ranges[i][1]-ranges[i][0])+ranges[i][0] 
      for i in range(len(rows[0]))] for j in range(k)]
      
      lastmatches=None
      for t in range(100):
        print 'Iteration %d' % t
        bestmatches=[[] for i in range(k)]
        
        # Find which centroid is the closest for each row
        for j in range(len(rows)):
          row=rows[j]
          bestmatch=0
          for i in range(k):
            d=distance(clusters[i],row)
            if d<distance(clusters[bestmatch],row): bestmatch=i
          bestmatches[bestmatch].append(j)
    
        # If the results are the same as last time, this is complete
        if bestmatches==lastmatches: break
        lastmatches=bestmatches
        
        # Move the centroids to the average of their members
        for i in range(k):
          avgs=[0.0]*len(rows[0])
          if len(bestmatches[i])>0:
            for rowid in bestmatches[i]:
              for m in range(len(rows[rowid])):
                avgs[m]+=rows[rowid][m]
            for j in range(len(avgs)):
              avgs[j]/=len(bestmatches[i])
            clusters[i]=avgs
          
      return bestmatches
  • 相关阅读:
    131.著作权
    130.专利权
    idea新用法
    map的put和putIfAbsent使用
    netty的option和childOption
    Java8 lam。。。表达式
    protobuf学习
    protobuf生成
    idea调试
    spring,mapper的参数
  • 原文地址:https://www.cnblogs.com/huanhuanang/p/5253055.html
Copyright © 2011-2022 走看看