http://fuliang.iteye.com/blog/891991
K-means是一个简单容易实现的聚类算法,我们以对一个图片的颜色的RGB值进行聚类为例,
实现这个算法。
K-means算法是一个EM的迭代过程:
1.随机选择k个作为聚类中心
2.E step:
对每一个点,计算它到每一个聚类中心的距离,把这个点分配到最近的聚类中心代表的
聚类中。
3.M step:
重新计算每个聚类的中心:每个聚类中心为该聚类所有点的均值。
重复2~3直到达到最大的迭代次数或者聚类不再发生变化。
- #!/usr/bin/ruby
- # autor: fuliang http://fuliang.iteye.com/
- class RGB
- attr_accessor :r,:g,:b
- def initialize(r=0,g=0,b=0)
- @r,@g,@b = r,g,b
- end
- def +(rgb)
- @r += rgb.r
- @g += rgb.g
- @b += rgb.b
- self
- end
- def /(n)
- @r /= n
- @g /= n
- @b /= n
- self
- end
- end
- def random_k_centers(instances,k)
- rand_indxes = (0...instances.size).sort_by{rand}[0...k]
- instances.values_at(*rand_indxes)
- end
- def distance(ins1,ins2)#采用余弦值,因为255,255,255和200,200,200颜色基本类似
- dot_product = ins1.r * ins2.r + ins1.g * ins2.g + ins1.b * ins2.b
- mod1 = Math.sqrt(ins1.r * ins1.r + ins1.g * ins1.g + ins1.b * ins1.b)
- mod2 = Math.sqrt(ins2.r * ins2.r + ins2.g * ins2.g + ins2.b * ins2.b)
- return 1 - dot_product / (mod1 * mod2)
- end
- def k_means_cluster(instances,k,max_iter=100)
- random_centers = random_k_centers(instances,k)
- p random_centers
- instance_cluster_map = {}
- change_count = 0
- clusters = []
- 0.upto(max_iter) do |iter_num|
- clusters = []
- puts "iterate #{iter_num} ..."
- change_count = 0
- # E-step
- instances.each do |instance|
- min_distance = 1.0/0
- min_indx = 0
- random_centers.each_with_index do |center,index|
- current_distance = distance(center,instance)
- if min_distance > current_distance then
- min_indx = index
- min_distance = current_distance
- end
- end
- if instance_cluster_map[instance] != min_indx then#trace the change
- change_count += 1
- instance_cluster_map[instance] = min_indx
- end
- clusters[min_indx] ||= []
- clusters[min_indx] << instance
- end
- puts "change_count=#{change_count}"
- break if change_count.zero?
- #M-step
- clusters.each_with_index do |cluster,index|
- center = RGB.new
- cluster.each do |instance|
- center += instance
- end
- center /= cluster.size
- random_centers[index] = center # update center
- end
- end
- return clusters
- end
- instances = []
- File.open("rgbs.txt").each_line do |line|
- line.split(/\t/).each do | rgb |
- r,g,b = rgb.split(/,/).collect{|e| e.to_i}
- instances << RGB.new(r,g,b)
- end
- end
- clusters = k_means_cluster(instances,5,100)
- k_candidates = []
- clusters.each do |cluster|
- sum = cluster.inject(RGB.new) {|sum,ins| sum + ins}
- candidate = sum / cluster.size
- k_candidates << candidate
- end
- p k_candidates
可以使用聚类算法对这个图片进行有损压缩。