zoukankan      html  css  js  c++  java
  • 『科学计算』层次聚类实现

    层次聚类理论自行百度,这里是一个按照我的理解的简单实现,

    我们先看看数据,

    啤酒名 热量 钠含量 酒精 价格
    Budweiser 144.00 19.00 4.70 .43
    Schlitz 181.00 19.00 4.90 .43
    Ionenbrau 157.00 15.00 4.90 .48
    Kronensourc 170.00 7.00 5.20 .73
    Heineken 152.00 11.00 5.00 .77
    Old-milnaukee 145.00 23.00 4.60 .26
    Aucsberger 175.00 24.00 5.50 .40
    Strchs-bohemi 149.00 27.00 4.70 .42
    Miller-lite 99.00 10.00 4.30 .43
    Sudeiser-lich 113.00 6.00 3.70 .44
    Coors 140.00 16.00 4.60 .44
    Coorslicht 102.00 15.00 4.10 .46
    Michelos-lich 135.00 11.00 4.20 .50
    Secrs 150.00 19.00 4.70 .76
    Kkirin 149.00 6.00 5.00 .79
    Pabst-extra-l 68.00 15.00 2.30 .36
    Hamms 136.00 19.00 4.40 .43
    Heilemans-old 144.00 24.00 4.90 .43
    Olympia-gold- 72.00 6.00 2.90 .46
    Schlite-light 97.00 7.00 4.20 .47

    程序如下,

    import numpy as np
    import pandas as pd
    
    data = pd.read_csv('./bear.txt', delim_whitespace=True)
    X = np.array(data.ix[:,1:])
    names = [[name] for name in data.ix[:,0]]
    
    def cluster_step(X,names):
        dis = np.empty([len(X),len(X)])
        for i in range(len(X)):
            for j in range(len(X)):
                dis[i][j] = np.sqrt(np.sum(np.square(X[i] - X[j])))
                if i == j:
                    dis[i][j] = 999
        x, y = [(np.argmin(dis))//len(X), np.mod(np.argmin(dis),len(X))]
        X[x] = (X[x] + X[y])/2
        X = np.delete(X, y, axis=0)
        names[x].extend(names[y])
        names.remove(names[y])
        return x, y, X, names, dis
    
    def cluster(X, num, names):
        classes = len(X)
        while classes != num:
            _x, _y, X, names, _dis = cluster_step(X, names)
            with open('./result.txt', 'a') as f:
                f.write('
    '+str(_x))
                f.write('
    '+str(_y))
                f.write('
    ' + str(_dis[_x,_y]))
                f.write('
    '+str(_dis))
                f.write('
    '+str(names))
                f.flush()
            classes -= 1
        return names
    
    if __name__=='__main__':
        names = cluster(X, 4, names)

    规则是每次合并后去中心点(每一步会合并两个位置,取均值做新位置)作为类簇位置,距离使用的是欧式距离。

    实际上由于每次合并后下一次的节点会减少,和最初的20个点就对不上了,头疼了好一会,后来想到在每一次迭代中把每一个种类名按照类去合并,这样就不需要在最后利用索引去复原啤酒种类了,感觉挺机智。由于这样直接说不直观,我下面给出中间输出,

    [['Budweiser'], ['Schlitz'], ['Ionenbrau'], ['Kronensourc'], ['Heineken'], ['Old-milnaukee', 'Heilemans-old'], ['Aucsberger'], ['Strchs-bohemi'], ['Miller-lite'], ['Sudeiser-lich'], ['Coors'], ['Coorslicht'], ['Michelos-lich'], ['Secrs'], ['Kkirin'], ['Pabst-extra-l'], ['Hamms'], ['Olympia-gold-'], ['Schlite-light']]

    [['Budweiser'], ['Schlitz'], ['Ionenbrau'], ['Kronensourc'], ['Heineken'], ['Old-milnaukee', 'Heilemans-old'], ['Aucsberger'], ['Strchs-bohemi'], ['Miller-lite', 'Schlite-light'], ['Sudeiser-lich'], ['Coors'], ['Coorslicht'], ['Michelos-lich'], ['Secrs'], ['Kkirin'], ['Pabst-extra-l'], ['Hamms'], ['Olympia-gold-']]

    [['Budweiser', 'Old-milnaukee', 'Heilemans-old'], ['Schlitz'], ['Ionenbrau'], ['Kronensourc'], ['Heineken'], ['Aucsberger'], ['Strchs-bohemi'], ['Miller-lite', 'Schlite-light'], ['Sudeiser-lich'], ['Coors'], ['Coorslicht'], ['Michelos-lich'], ['Secrs'], ['Kkirin'], ['Pabst-extra-l'], ['Hamms'], ['Olympia-gold-']]

    [['Budweiser', 'Old-milnaukee', 'Heilemans-old'], ['Schlitz'], ['Ionenbrau'], ['Kronensourc'], ['Heineken'], ['Aucsberger'], ['Strchs-bohemi'], ['Miller-lite', 'Schlite-light'], ['Sudeiser-lich'], ['Coors', 'Hamms'], ['Coorslicht'], ['Michelos-lich'], ['Secrs'], ['Kkirin'], ['Pabst-extra-l'], ['Olympia-gold-']]

    ... ... ...

    每次list长度减少1,某个子list长度加一这样

     

    查看一下输出,

    names
    Out[1]:
    [['Budweiser',
    'Old-milnaukee',
    'Heilemans-old',
    'Secrs',
    'Strchs-bohemi',
    'Ionenbrau',
    'Heineken',
    'Kkirin',
    'Coors',
    'Hamms',
    'Michelos-lich'],
    ['Schlitz', 'Aucsberger', 'Kronensourc'],
    ['Miller-lite', 'Schlite-light', 'Coorslicht', 'Sudeiser-lich'],
    ['Pabst-extra-l', 'Olympia-gold-']]

  • 相关阅读:
    13种常用按钮、文本框、表单等CSS样式
    独家:深度介绍Linux内核是如何工作的
    查看chrome 已有插件
    Oracle双机冗余实战
    战争地带2100(Warzone 2100)
    Elive 1.9.24 (Unstable)发布
    使用 Vagrant+Docker 构建 PHP 最优开发环境
    基于socketio实现微信聊天功能
    MySQL的查询需要遍历几次B+树,理论上需要几次磁盘I/O?
    马蜂窝裁php换java,php又又又凉凉了吗
  • 原文地址:https://www.cnblogs.com/hellcat/p/7612303.html
Copyright © 2011-2022 走看看