zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 4 - Clustering Models

    Segment 2 - Hierarchical methods

    Hierarchical Clustering

    Hierarchical clustering methods predict subgroups within data by finding the distance between each data point and its nearest neighbors, and then linking the most nearby neighbors.

    The algorithm uses the distance metric it calculates to predict subgroups.

    To guess the number of subgroups in a dataset, first look at a dendrogram visualization of the clustering results.

    Hierarchical Clustering Dendrogram

    Dendrogram: a tree graph that's useful for visually displaying taxonomies, lineages, and relatedness

    Hierarchical Clustering Use Cases

    • Hospital Resource Management
    • Customer Segmentation
    • Business Process Management
    • Social Network Analysis

    Hierarchical Clustering Parameters

    Distance Metrics

    • Euclidean
    • Manhattan
    • Cosine

    Linkage Parameters

    • Ward
    • Complete
    • Average

    Parameter selection method: use trial and error

    Setting up for clustering analysis

    import numpy as np
    import pandas as pd
    
    import matplotlib.pyplot as plt
    from pylab import rcParams
    import seaborn as sb
    
    import sklearn
    import sklearn.metrics as sm
    
    from sklearn.cluster import AgglomerativeClustering
    
    import scipy
    from scipy.cluster.hierarchy import dendrogram, linkage
    from scipy.cluster.hierarchy import fcluster
    from scipy.cluster.hierarchy import cophenet
    from scipy.spatial.distance import pdist
    
    np.set_printoptions(precision=4, suppress=True)
    plt.figure(figsize=(10, 3))
    %matplotlib inline
    plt.style.use('seaborn-whitegrid')
    
    address = '~/Data/mtcars.csv'
    
    cars = pd.read_csv(address)
    cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']
    
    X = cars[['mpg','disp','hp','wt']].values
    
    y = cars.iloc[:,(9)].values
    

    Using scipy to generate dendrograms

    Z = linkage(X, 'ward')
    
    dendrogram(Z, truncate_mode='lastp', p=12, leaf_rotation=45., leaf_font_size=15, show_contracted=True)
    
    plt.title('Truncated Hierarchial Clustering Diagram')
    plt.xlabel('Cluster Size')
    plt.ylabel('Distance')
    
    plt.axhline(y=500)
    plt.axhline(y=100)
    plt.show()
    

    ML0402 output_7_0

    Generating hierarchical clusters

    k = 2
    
    Hclustering = AgglomerativeClustering(n_clusters=k, affinity='euclidean', linkage='ward')
    Hclustering.fit(X)
    
    sm.accuracy_score(y, Hclustering.labels_)
    
    0.78125
    
    Hclustering = AgglomerativeClustering(n_clusters=k, affinity='euclidean', linkage='average')
    Hclustering.fit(X)
    
    sm.accuracy_score(y, Hclustering.labels_)
    
    0.78125
    
    Hclustering = AgglomerativeClustering(n_clusters=k, affinity='manhattan', linkage='average')
    Hclustering.fit(X)
    
    sm.accuracy_score(y, Hclustering.labels_)
    
    0.71875
  • 相关阅读:
    linux centos7环境下安装apache2.4+php5.6+mysql5.6 安装及踩坑集锦(二)
    linux centos7环境下安装apache2.4+php5.6+mysql5.6 安装及踩坑集锦
    C# 获取当前登录IP
    清除ios系统alert弹出框的域名
    在线文档预览示例
    lnmp1.5一键安装包安装lnmpa后,添加站点
    解决sql server2008数据库安装之后,web程序80端口被占用问题(终极方案)
    码云上传项目流程
    SQLServer2008不允许保存更改错误解决办法
    tp5 使用phpword 替换word模板并利用com组件转换pdf
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14320060.html
Copyright © 2011-2022 走看看