zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 5 - Dimensionality Reduction Methods

    Segment 1 - Explanatory factor analysis

    Factor Analysis

    A method that explores a data set in order to find root causes which explain why data is acting a certain way

    Factors(or latent variables): variables that are quite meaningful but that are inferred and not directly observable

    Factor Analysis Assumptions

    • Features are metric
    • Feature are continuous or ordinal
    • There is r > 0.3 correlation between the features in your dataset
    • You have > 100 observations and > 5 observations per feature
    • Sample is homogenous

    The Iris Dataset

    Iris flowers(labels):

    • Setosa
    • Versicolour
    • Virginica

    Attributes (predictive features):

    • Sepal length
    • Sepal length
    • Petal length
    • Petal width

    Factor Loading

    • ~ -1 or 1 = Factor has a strong influence on the variable
    • ~0 = Factor weakly influences on the variable
    • '>1 = That means these are highly correlated factors
    import pandas as pd
    import numpy as np
    
    import sklearn
    from sklearn.decomposition import FactorAnalysis
    
    from sklearn import datasets
    

    Factor analysis on iris dataset

    iris = datasets.load_iris()
    
    X = iris.data
    variable_names = iris.feature_names
    
    X[0:10,]
    
    array([[5.1, 3.5, 1.4, 0.2],
           [4.9, 3. , 1.4, 0.2],
           [4.7, 3.2, 1.3, 0.2],
           [4.6, 3.1, 1.5, 0.2],
           [5. , 3.6, 1.4, 0.2],
           [5.4, 3.9, 1.7, 0.4],
           [4.6, 3.4, 1.4, 0.3],
           [5. , 3.4, 1.5, 0.2],
           [4.4, 2.9, 1.4, 0.2],
           [4.9, 3.1, 1.5, 0.1]])
    
    factor = FactorAnalysis().fit(X)
    
    DF = pd.DataFrame(factor.components_, columns=variable_names)
    print(DF)
    
       sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
    0           0.706989         -0.158005           1.654236           0.70085
    1           0.115161          0.159635          -0.044321          -0.01403
    2          -0.000000          0.000000           0.000000           0.00000
    3          -0.000000          0.000000           0.000000          -0.00000
  • 相关阅读:
    centOS7下安装GUI图形界面
    centos7 安装VMware Tools 遇到的一系列问题的解决方案
    centos7 更新源 安装ifconfig
    隐写 小案例
    linux shell脚本编程笔记(二): 分支结构
    strncmp memcmp区别
    linux shell脚本编程笔记(一): 构建基本脚本
    linux: sort排序数据 grep搜索数据
    C/C++控制台输出时设置字体及背景颜色
    boost.asio包装类st_asio_wrapper开发教程(转)
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14321001.html
Copyright © 2011-2022 走看看