zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 5 - Dimensionality Reduction Methods

    Segment 2 - Principal component analysis (PCA)

    Singular Value Decomposition

    A linear algebra method that decomposes a matrix into three resultant matrices in order to reduce information redundancy and noise

    SVD is most commonly used for principal component analysis.

    The Anatomy of SVD

    A = u * v * S

    • A = Original matrix
    • u = Left orthogonal matrix: hold important, nonredundant information about observations
    • v = Right orthogonal matrix: holds important, nonredundant information on features
    • S = Diagonal matrix: contains all of the information about the decomposition processes performed during the compression

    Principal Component

    Uncorrelated features that embody a dataset's important information (its "variance") with the redundancy, noise, and outliers stripped out

    PCA Use Cases

    • Fraud Detection
    • Speech Recognition
    • Spam Detection
    • Image Recognition

    Using Factors and Components

    • Both factors and components represent what is left of a dataset after information redundancy and noise is stripped out
    • Use them as input variables for machine learning algorithms to generate predictions from these compressed representations of your data
    import numpy as np
    import pandas as pd
    
    import matplotlib.pyplot as plt
    import pylab as plt
    import seaborn as sb
    from IPython.display import Image
    from IPython.core.display import HTML 
    from pylab import rcParams
    
    import sklearn
    from sklearn import datasets
    
    from sklearn import decomposition
    from sklearn.decomposition import PCA
    
    %matplotlib inline
    rcParams['figure.figsize'] = 5, 4
    sb.set_style('whitegrid')
    

    PCA on the iris dataset

    iris = datasets.load_iris()
    X = iris.data
    variable_names = iris.feature_names
    
    X[0:10,]
    
    array([[5.1, 3.5, 1.4, 0.2],
           [4.9, 3. , 1.4, 0.2],
           [4.7, 3.2, 1.3, 0.2],
           [4.6, 3.1, 1.5, 0.2],
           [5. , 3.6, 1.4, 0.2],
           [5.4, 3.9, 1.7, 0.4],
           [4.6, 3.4, 1.4, 0.3],
           [5. , 3.4, 1.5, 0.2],
           [4.4, 2.9, 1.4, 0.2],
           [4.9, 3.1, 1.5, 0.1]])
    
    pca = decomposition.PCA()
    iris_pca = pca.fit_transform(X)
    
    pca.explained_variance_ratio_
    
    array([0.92461872, 0.05306648, 0.01710261, 0.00521218])
    
    pca.explained_variance_ratio_.sum()
    
    1.0
    
    comps = pd.DataFrame(pca.components_, columns=variable_names)
    comps
    
    sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
    0 0.361387 -0.084523 0.856671 0.358289
    1 0.656589 0.730161 -0.173373 -0.075481
    2 -0.582030 0.597911 0.076236 0.545831
    3 -0.315487 0.319723 0.479839 -0.753657
    sb.heatmap(comps,cmap="Blues", annot=True)
    
    <matplotlib.axes._subplots.AxesSubplot at 0x7f436793d240>
    

    ML0502output_9_1

  • 相关阅读:
    JavascriptCore中扩展自定义函数
    Qt for Symbian应用的开发转载
    Qt Symbian 开发环境安装转载
    qt屏幕抓图
    为JavascriptCore添加自定义对象
    Tesseract OCR开源项目转载
    Joomla, Wordpress, Drupal 全面详细Pk比较转载
    Drupal,Joomla,Wordpress在内置应用功能方面的区别和比较转载
    Docker:第五章:基于centos7的docker安装配置部署教程以及基于docker Tomcat镜像使用的项目部署教程
    【JavaP6大纲】Redis篇:redis集群工作原理,协议,分布式寻址算法
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14321139.html
Copyright © 2011-2022 走看看