hierarchy 在大数据上聚类的利弊 - 走看看

zoukankan html css js c++ java

hierarchy 在大数据上聚类的利弊

Well, hierarchical clustering doesn't make that much sense for large datasets. It's actually mostly a textbook example in my opinion. The problem with hierarchical clustering is that it doesn't really build sensible clusters. It builds a dendrogram, but with 14000 objects the dendrogram becomes pretty much unusable. And very few implementations of hierarchical clustering have non-trivial methods to extract sensible clusters from the dendrogram. Plus, in the general case, hierarchical clustering is of complexityO(n^3) which makes it scale really bad to large datasets.

DBSCAN technically does not need a distance matrix. In fact, when you use a distance matrix, it will beslow, as computing the distance matrix already is O(n^2). And even then, you can safe the O(n^2)memory cost for DBSCAN by computing the distances on the fly at the cost of computing distances twice each. DBSCAN visits each point once, so there is next to no benefit from using a distance matrix except the symmetry gain. And technically, you could do some neat caching tricks to even reduce that, since DBSCAN also just needs to know which objects are below the epsilon threshold. When the epsilon is chosen reasonably, managing the neighbor sets on the fly will use significantly less memory thanO(n^2) at the same CPU cost of computing the distance matrix.

Any really good implementation of DBSCAN (it is spelled all uppercas

查看全文

相关阅读:
28、vSocket模型详解及select应用详解
 27、通过visual s'tudio 验证 SOCKET编程：搭建一个TCP服务器
 26、TCP服务器原理
 8、字符串操作
 9、内存操作
 ESP32作为接入点AP
·通过wifi_scan学习esp32wifi程序编写
 10、指针变量基础
 关于wifi网络基本原理了解
 开发团队中命名规范的重要性

原文地址：https://www.cnblogs.com/harveyaot/p/3408225.html

Copyright © 2011-2022 走看看