探索性因子分析法 | exploratory factor analysis | EFA | Genomic Structural Equation Modelling

zoukankan html css js c++ java

探索性因子分析法 | exploratory factor analysis | EFA | Genomic Structural Equation Modelling | SEM
先别问那么多为什么，学就对了，到最后你自然能融会贯通，读书百遍其义自见。

TOC

什么是EFA，这个统计方法是用来解决哪一大类问题的？

EFA的大致原理？

EFA与CFA和PCA的区别？

如何理解以下的遗传学中的EFA的使用？

什么是EFA，这个统计方法是用来解决哪一大类问题的？

属于因子分析（factor analysis）大类，FA又分为EFA（探索性因子分析）和CFA（验证性因子分析）。

用途类似PCA，找出主成分，将诸多抽象繁杂的指标浓缩为少数具有代表性的评价因子。

因子分析有被称为潜在变量模型（latent variable model）

EFA的大致原理？

假设所有的变量均由两部分构成，一为公共因子（common factor），一为独特因子（unique factor）。

common factor数量比n少，意味着一次浓缩降维，而unique factor的数量则等于n。

假设：
- 所有的独特因子间互不相关
- 所有的独特因子与所有的公共因子间也不相关
假如有三个变量X1、X2、X3，它们间的相关性分别为p1、p2、p3，我们假设这三个变量存在一个公共因子F，假设这三个变量的因子载荷量factor loading分别为λ1、λ2、λ3。那么我们可以将相关性用factor loading来表示，即 p1 = λ1 * λ2. 这里的factor loading就是变量对common factor的方差贡献。

最后，每一个原始变量都可以表示为common factor和unique factor的线性组合（factor loading）。类似于PCA里的PC变成了common factor和unique factor。

这里再次强调，一定要深刻理解“方差贡献”，这是许多统计学方法的基本工具！！！heritability、ANOVA、R2、协方差分析等。

基本知识回顾：

以1、2、3的vector为例，

期望计算：即所有变量和的均数，这里为2. 代表了样本的集中度。

方差计算：即与平均数差的平方的均数，这里为1. 代表了样本的离散程度，除以n就是为了消除样本的数量的影响。

EFA与CFA和PCA的区别？

CFA是指已经有一些浓缩因子了，需要验证和确定这些因子能否与样本匹配，则需要用验证性因子分析，进行理论推导分析。

如何理解以下的遗传学中的EFA的使用？

Genomic Relationships, Novel Loci, and Pleiotropic Mechanisms across Eight Psychiatric Disorders

We modeled the genome-wide joint architecture of the eight neuropsychiatric disorders using an exploratory factor analysis (EFA) (Gorsuch, 1988), followed by genomic structural equation modeling (SEM) (Grotzinger et al., 2019) (STAR Methods; Figure 1C). EFA identified three correlated factors, which together explained 51% of the genetic variation in the eight neuropsychiatric disorders (Table S2.2). The first factor consisted primarily of disorders characterized by compulsive/perfectionistic behaviors, specifically AN, OCD, and, more weakly, TS. The second factor was characterized by mood and psychotic disorders (MD, BIP, and SCZ), and the third factor by three early-onset neurodevelopmental disorders (ASD, ADHD, TS) as well as MD. Similar to our EFA results, hierarchical clustering analyses also identified three sub-groups among the eight disorders (Data S1.1). Based on extensive follow-up analyses, this genetic correlational structure does not appear to be biased by sample overlap or sample size differences among the eight disorders (Data S1.2-1.4).

又多出一个必须了解的概念：Genomic Structural Equation Modelling (Genomic SEM)

待续~

参考：

探索性因子分析法 - MBA智库

小白须知之探索性因子分析

第7讲探索性因子分析 - 良心课件

How To Calculate and Understand Analysis of Variance (ANOVA) F Test. - 【强烈推荐这个频道】方差分析步骤拆解，最好自己推导一遍，了解为什么总差方=组内差方+组间差方。
查看全文

相关阅读:
挺有意思的HBase日志+Splunk
eclipse连接远程hadoop集群开发时权限不足问题解决方案
 auxiliary variable（辅助变量）的引入
 auxiliary variable（辅助变量）的引入
 十万个为什么 —— 自然的好奇
 十万个为什么 —— 自然的好奇
 高级鸡汤
 高级鸡汤
 protobuf反射详解
 思想实验（逻辑思维）解题

原文地址：https://www.cnblogs.com/leezx/p/13356878.html

探索性因子分析法 | exploratory factor analysis | EFA | Genomic Structural Equation Modelling | SEM

什么是EFA，这个统计方法是用来解决哪一大类问题的？

EFA的大致原理？

EFA与CFA和PCA的区别？

如何理解以下的遗传学中的EFA的使用？