zoukankan html css js c++ java

Python for Data Science

Chapter 5 - Dimensionality Reduction Methods

Segment 1 - Explanatory factor analysis

Factor Analysis

A method that explores a data set in order to find root causes which explain why data is acting a certain way

Factors(or latent variables): variables that are quite meaningful but that are inferred and not directly observable

Factor Analysis Assumptions

Features are metric
Feature are continuous or ordinal
There is r > 0.3 correlation between the features in your dataset
You have > 100 observations and > 5 observations per feature
Sample is homogenous

The Iris Dataset

Iris flowers(labels):

Setosa
Versicolour
Virginica

Attributes (predictive features):

Sepal length
Sepal length
Petal length
Petal width

Factor Loading

~ -1 or 1 = Factor has a strong influence on the variable
~0 = Factor weakly influences on the variable
'>1 = That means these are highly correlated factors

import pandas as pd
import numpy as np

import sklearn
from sklearn.decomposition import FactorAnalysis

from sklearn import datasets

Factor analysis on iris dataset

iris = datasets.load_iris()

X = iris.data
variable_names = iris.feature_names

X[0:10,]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

factor = FactorAnalysis().fit(X)

DF = pd.DataFrame(factor.components_, columns=variable_names)
print(DF)

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0           0.706989         -0.158005           1.654236           0.70085
1           0.115161          0.159635          -0.044321          -0.01403
2          -0.000000          0.000000           0.000000           0.00000
3          -0.000000          0.000000           0.000000          -0.00000

查看全文

相关阅读:
Job for apache2.service failed because the control process exited with error code. See "systemctl status apache2.service" and "journalctl -xe" for details.
Jsonpath的基本使用
 [转]什么是CNN、RNN、LSTM
[转]爬虫 selenium + phantomjs / chrome
[转]js async await 终极异步解决方案
 [转]如何让多个不同类型的后端网站用一个nginx进行反向代理实际场景分析
 [转]java常用正则表达式
 [转]关于SSH与SSM的组成及其区别
 用live2d给自己的博客园加个小卡通，可以固定model也可以随机设置
 [转]关于/r与/n 以及 /r/n 的区别总结

原文地址：https://www.cnblogs.com/keepmoving1113/p/14321001.html