zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 5 - Dimensionality Reduction Methods

    Segment 1 - Explanatory factor analysis

    Factor Analysis

    A method that explores a data set in order to find root causes which explain why data is acting a certain way

    Factors(or latent variables): variables that are quite meaningful but that are inferred and not directly observable

    Factor Analysis Assumptions

    • Features are metric
    • Feature are continuous or ordinal
    • There is r > 0.3 correlation between the features in your dataset
    • You have > 100 observations and > 5 observations per feature
    • Sample is homogenous

    The Iris Dataset

    Iris flowers(labels):

    • Setosa
    • Versicolour
    • Virginica

    Attributes (predictive features):

    • Sepal length
    • Sepal length
    • Petal length
    • Petal width

    Factor Loading

    • ~ -1 or 1 = Factor has a strong influence on the variable
    • ~0 = Factor weakly influences on the variable
    • '>1 = That means these are highly correlated factors
    import pandas as pd
    import numpy as np
    
    import sklearn
    from sklearn.decomposition import FactorAnalysis
    
    from sklearn import datasets
    

    Factor analysis on iris dataset

    iris = datasets.load_iris()
    
    X = iris.data
    variable_names = iris.feature_names
    
    X[0:10,]
    
    array([[5.1, 3.5, 1.4, 0.2],
           [4.9, 3. , 1.4, 0.2],
           [4.7, 3.2, 1.3, 0.2],
           [4.6, 3.1, 1.5, 0.2],
           [5. , 3.6, 1.4, 0.2],
           [5.4, 3.9, 1.7, 0.4],
           [4.6, 3.4, 1.4, 0.3],
           [5. , 3.4, 1.5, 0.2],
           [4.4, 2.9, 1.4, 0.2],
           [4.9, 3.1, 1.5, 0.1]])
    
    factor = FactorAnalysis().fit(X)
    
    DF = pd.DataFrame(factor.components_, columns=variable_names)
    print(DF)
    
       sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
    0           0.706989         -0.158005           1.654236           0.70085
    1           0.115161          0.159635          -0.044321          -0.01403
    2          -0.000000          0.000000           0.000000           0.00000
    3          -0.000000          0.000000           0.000000          -0.00000
  • 相关阅读:
    机器学习 深度学习 计算机视觉 资料汇总
    激活层和pooling的作用
    NVIDIA GPU 计算能力
    TX2 刷机过程
    Anaconda tensorflow 安装笔记
    yolo-开源数据集coco kitti voc
    TX2上yolov3精度和速度优化方向
    yolo原理学习
    ubuntu常用命令
    tensorflow mnist模块详解
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14321001.html
Copyright © 2011-2022 走看看