zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 5 - Dimensionality Reduction Methods

    Segment 1 - Explanatory factor analysis

    Factor Analysis

    A method that explores a data set in order to find root causes which explain why data is acting a certain way

    Factors(or latent variables): variables that are quite meaningful but that are inferred and not directly observable

    Factor Analysis Assumptions

    • Features are metric
    • Feature are continuous or ordinal
    • There is r > 0.3 correlation between the features in your dataset
    • You have > 100 observations and > 5 observations per feature
    • Sample is homogenous

    The Iris Dataset

    Iris flowers(labels):

    • Setosa
    • Versicolour
    • Virginica

    Attributes (predictive features):

    • Sepal length
    • Sepal length
    • Petal length
    • Petal width

    Factor Loading

    • ~ -1 or 1 = Factor has a strong influence on the variable
    • ~0 = Factor weakly influences on the variable
    • '>1 = That means these are highly correlated factors
    import pandas as pd
    import numpy as np
    
    import sklearn
    from sklearn.decomposition import FactorAnalysis
    
    from sklearn import datasets
    

    Factor analysis on iris dataset

    iris = datasets.load_iris()
    
    X = iris.data
    variable_names = iris.feature_names
    
    X[0:10,]
    
    array([[5.1, 3.5, 1.4, 0.2],
           [4.9, 3. , 1.4, 0.2],
           [4.7, 3.2, 1.3, 0.2],
           [4.6, 3.1, 1.5, 0.2],
           [5. , 3.6, 1.4, 0.2],
           [5.4, 3.9, 1.7, 0.4],
           [4.6, 3.4, 1.4, 0.3],
           [5. , 3.4, 1.5, 0.2],
           [4.4, 2.9, 1.4, 0.2],
           [4.9, 3.1, 1.5, 0.1]])
    
    factor = FactorAnalysis().fit(X)
    
    DF = pd.DataFrame(factor.components_, columns=variable_names)
    print(DF)
    
       sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
    0           0.706989         -0.158005           1.654236           0.70085
    1           0.115161          0.159635          -0.044321          -0.01403
    2          -0.000000          0.000000           0.000000           0.00000
    3          -0.000000          0.000000           0.000000          -0.00000
  • 相关阅读:
    【彩彩只能变身队】(迟到的)团队介绍
    【彩彩只能变身队】用户需求分析(二)—— 调查结果
    【彩彩只能变身队】用户需求分析(一)—— 调查问卷
    C语言I博客作业04
    C语言I博客作业06
    c语言1作业07
    C语言I博客作业03
    C语言I博客作业02
    C语言I博客作业05
    【OpenGL编程指南】之投影和视口变换
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14321001.html
Copyright © 2011-2022 走看看