DBSCAN(D, eps, MinPts)
C = 0
for each unvisited point P in dataset D //每个没有访问的节点
mark P as visited
NeighborPts = regionQuery(P, eps) //查找该区域内的所有邻居节点
if sizeof(NeighborPts) < MinPts
mark P as NOISE
else
C = next cluster //新建一个cluster
expandCluster(P, NeighborPts, C, eps, MinPts) //扩展这个新的cluster
expandCluster(P, NeighborPts, C, eps, MinPts)
add P to cluster C
for each point P' in NeighborPts
if P' is not visited
mark P' as visited
NeighborPts' = regionQuery(P', eps) //把p的邻居都拿进来
if sizeof(NeighborPts') >= MinPts
NeighborPts = NeighborPts joined with NeighborPts' //更新迭代过程,不停的增加新的neighbor进来
if P' is not yet member of any cluster
add P' to cluster C
regionQuery(P, eps)
return all points within P's eps-neighborhood (including P)
今天看这段DBSCan 代码,惊叹和我当年写的基于标签网络的话题挖掘的思路是如此的一致,其中又有些略微的不同,其实没有什么难的,非常简单。
回去 把这篇文章的代码好好消化一下,自己动手写一下,C++的东西学习一下。http://www.cnblogs.com/weixliu/archive/2012/12/08/2808815.html