zoukankan html css js c++ java

k均值聚类

1）从N个文档随机选取K个文档作为质心
2）对剩余的每个文档测量其到每个质心的距离，并把它归到最近的质心的类
3）重新计算已经得到的各个类的质心
4）迭代2～3步直至新的质心与原质心相等或小于指定阈值，算法结束

k均值聚类python代码实现：

def kcluster(rows,distance=pearson,k=4):
  # Determine the minimum and maximum values for each point
  ranges=[(min([row[i] for row in rows]),max([row[i] for row in rows])) 
  for i in range(len(rows[0]))]
  print "ranges",ranges[0]
  print "ranges",ranges[1]
  # Create k randomly placed centroids
  clusters=[[random.random()*(ranges[i][1]-ranges[i][0])+ranges[i][0] 
  for i in range(len(rows[0]))] for j in range(k)]
  
  lastmatches=None
  for t in range(100):
    print 'Iteration %d' % t
    bestmatches=[[] for i in range(k)]
    
    # Find which centroid is the closest for each row
    for j in range(len(rows)):
      row=rows[j]
      bestmatch=0
      for i in range(k):
        d=distance(clusters[i],row)
        if d<distance(clusters[bestmatch],row): bestmatch=i
      bestmatches[bestmatch].append(j)

    # If the results are the same as last time, this is complete
    if bestmatches==lastmatches: break
    lastmatches=bestmatches
    
    # Move the centroids to the average of their members
    for i in range(k):
      avgs=[0.0]*len(rows[0])
      if len(bestmatches[i])>0:
        for rowid in bestmatches[i]:
          for m in range(len(rows[rowid])):
            avgs[m]+=rows[rowid][m]
        for j in range(len(avgs)):
          avgs[j]/=len(bestmatches[i])
        clusters[i]=avgs
      
  return bestmatches

查看全文

相关阅读:
markdown with vim
递归
 类 sizeof
cppcheck工具
 c++ explicit的含义和用法
 pca主成分分析
 string的使用
 linux的shell进化简史
 adb shell 无法启动（insufficient permissions for device）
c++ 四种转换的意思

原文地址：https://www.cnblogs.com/huanhuanang/p/5253055.html