zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 5 - Dimensionality Reduction Methods

    Segment 1 - Explanatory factor analysis

    Factor Analysis

    A method that explores a data set in order to find root causes which explain why data is acting a certain way

    Factors(or latent variables): variables that are quite meaningful but that are inferred and not directly observable

    Factor Analysis Assumptions

    • Features are metric
    • Feature are continuous or ordinal
    • There is r > 0.3 correlation between the features in your dataset
    • You have > 100 observations and > 5 observations per feature
    • Sample is homogenous

    The Iris Dataset

    Iris flowers(labels):

    • Setosa
    • Versicolour
    • Virginica

    Attributes (predictive features):

    • Sepal length
    • Sepal length
    • Petal length
    • Petal width

    Factor Loading

    • ~ -1 or 1 = Factor has a strong influence on the variable
    • ~0 = Factor weakly influences on the variable
    • '>1 = That means these are highly correlated factors
    import pandas as pd
    import numpy as np
    
    import sklearn
    from sklearn.decomposition import FactorAnalysis
    
    from sklearn import datasets
    

    Factor analysis on iris dataset

    iris = datasets.load_iris()
    
    X = iris.data
    variable_names = iris.feature_names
    
    X[0:10,]
    
    array([[5.1, 3.5, 1.4, 0.2],
           [4.9, 3. , 1.4, 0.2],
           [4.7, 3.2, 1.3, 0.2],
           [4.6, 3.1, 1.5, 0.2],
           [5. , 3.6, 1.4, 0.2],
           [5.4, 3.9, 1.7, 0.4],
           [4.6, 3.4, 1.4, 0.3],
           [5. , 3.4, 1.5, 0.2],
           [4.4, 2.9, 1.4, 0.2],
           [4.9, 3.1, 1.5, 0.1]])
    
    factor = FactorAnalysis().fit(X)
    
    DF = pd.DataFrame(factor.components_, columns=variable_names)
    print(DF)
    
       sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
    0           0.706989         -0.158005           1.654236           0.70085
    1           0.115161          0.159635          -0.044321          -0.01403
    2          -0.000000          0.000000           0.000000           0.00000
    3          -0.000000          0.000000           0.000000          -0.00000
  • 相关阅读:
    centos7的网络设置
    day1学习
    举例讲解Linux系统下Python调用系统Shell的方法
    Python引用模块和查找模块路径
    详解Python中的循环语句的用法
    [cf1566H]Xorquiz
    [luogu5180]支配树
    [atAGC055D]ABC Ultimatum
    [cf1552H]A Serious Referee
    [gym102538H]Horrible Cycles
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14321001.html
Copyright © 2011-2022 走看看