zoukankan      html  css  js  c++  java
  • MATLAB实例:PCA降维

    MATLAB实例:PCA降维

    作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/

    1. iris数据

    5.1,3.5,1.4,0.2,1
    4.9,3.0,1.4,0.2,1
    4.7,3.2,1.3,0.2,1
    4.6,3.1,1.5,0.2,1
    5.0,3.6,1.4,0.2,1
    5.4,3.9,1.7,0.4,1
    4.6,3.4,1.4,0.3,1
    5.0,3.4,1.5,0.2,1
    4.4,2.9,1.4,0.2,1
    4.9,3.1,1.5,0.1,1
    5.4,3.7,1.5,0.2,1
    4.8,3.4,1.6,0.2,1
    4.8,3.0,1.4,0.1,1
    4.3,3.0,1.1,0.1,1
    5.8,4.0,1.2,0.2,1
    5.7,4.4,1.5,0.4,1
    5.4,3.9,1.3,0.4,1
    5.1,3.5,1.4,0.3,1
    5.7,3.8,1.7,0.3,1
    5.1,3.8,1.5,0.3,1
    5.4,3.4,1.7,0.2,1
    5.1,3.7,1.5,0.4,1
    4.6,3.6,1.0,0.2,1
    5.1,3.3,1.7,0.5,1
    4.8,3.4,1.9,0.2,1
    5.0,3.0,1.6,0.2,1
    5.0,3.4,1.6,0.4,1
    5.2,3.5,1.5,0.2,1
    5.2,3.4,1.4,0.2,1
    4.7,3.2,1.6,0.2,1
    4.8,3.1,1.6,0.2,1
    5.4,3.4,1.5,0.4,1
    5.2,4.1,1.5,0.1,1
    5.5,4.2,1.4,0.2,1
    4.9,3.1,1.5,0.1,1
    5.0,3.2,1.2,0.2,1
    5.5,3.5,1.3,0.2,1
    4.9,3.1,1.5,0.1,1
    4.4,3.0,1.3,0.2,1
    5.1,3.4,1.5,0.2,1
    5.0,3.5,1.3,0.3,1
    4.5,2.3,1.3,0.3,1
    4.4,3.2,1.3,0.2,1
    5.0,3.5,1.6,0.6,1
    5.1,3.8,1.9,0.4,1
    4.8,3.0,1.4,0.3,1
    5.1,3.8,1.6,0.2,1
    4.6,3.2,1.4,0.2,1
    5.3,3.7,1.5,0.2,1
    5.0,3.3,1.4,0.2,1
    7.0,3.2,4.7,1.4,2
    6.4,3.2,4.5,1.5,2
    6.9,3.1,4.9,1.5,2
    5.5,2.3,4.0,1.3,2
    6.5,2.8,4.6,1.5,2
    5.7,2.8,4.5,1.3,2
    6.3,3.3,4.7,1.6,2
    4.9,2.4,3.3,1.0,2
    6.6,2.9,4.6,1.3,2
    5.2,2.7,3.9,1.4,2
    5.0,2.0,3.5,1.0,2
    5.9,3.0,4.2,1.5,2
    6.0,2.2,4.0,1.0,2
    6.1,2.9,4.7,1.4,2
    5.6,2.9,3.6,1.3,2
    6.7,3.1,4.4,1.4,2
    5.6,3.0,4.5,1.5,2
    5.8,2.7,4.1,1.0,2
    6.2,2.2,4.5,1.5,2
    5.6,2.5,3.9,1.1,2
    5.9,3.2,4.8,1.8,2
    6.1,2.8,4.0,1.3,2
    6.3,2.5,4.9,1.5,2
    6.1,2.8,4.7,1.2,2
    6.4,2.9,4.3,1.3,2
    6.6,3.0,4.4,1.4,2
    6.8,2.8,4.8,1.4,2
    6.7,3.0,5.0,1.7,2
    6.0,2.9,4.5,1.5,2
    5.7,2.6,3.5,1.0,2
    5.5,2.4,3.8,1.1,2
    5.5,2.4,3.7,1.0,2
    5.8,2.7,3.9,1.2,2
    6.0,2.7,5.1,1.6,2
    5.4,3.0,4.5,1.5,2
    6.0,3.4,4.5,1.6,2
    6.7,3.1,4.7,1.5,2
    6.3,2.3,4.4,1.3,2
    5.6,3.0,4.1,1.3,2
    5.5,2.5,4.0,1.3,2
    5.5,2.6,4.4,1.2,2
    6.1,3.0,4.6,1.4,2
    5.8,2.6,4.0,1.2,2
    5.0,2.3,3.3,1.0,2
    5.6,2.7,4.2,1.3,2
    5.7,3.0,4.2,1.2,2
    5.7,2.9,4.2,1.3,2
    6.2,2.9,4.3,1.3,2
    5.1,2.5,3.0,1.1,2
    5.7,2.8,4.1,1.3,2
    6.3,3.3,6.0,2.5,3
    5.8,2.7,5.1,1.9,3
    7.1,3.0,5.9,2.1,3
    6.3,2.9,5.6,1.8,3
    6.5,3.0,5.8,2.2,3
    7.6,3.0,6.6,2.1,3
    4.9,2.5,4.5,1.7,3
    7.3,2.9,6.3,1.8,3
    6.7,2.5,5.8,1.8,3
    7.2,3.6,6.1,2.5,3
    6.5,3.2,5.1,2.0,3
    6.4,2.7,5.3,1.9,3
    6.8,3.0,5.5,2.1,3
    5.7,2.5,5.0,2.0,3
    5.8,2.8,5.1,2.4,3
    6.4,3.2,5.3,2.3,3
    6.5,3.0,5.5,1.8,3
    7.7,3.8,6.7,2.2,3
    7.7,2.6,6.9,2.3,3
    6.0,2.2,5.0,1.5,3
    6.9,3.2,5.7,2.3,3
    5.6,2.8,4.9,2.0,3
    7.7,2.8,6.7,2.0,3
    6.3,2.7,4.9,1.8,3
    6.7,3.3,5.7,2.1,3
    7.2,3.2,6.0,1.8,3
    6.2,2.8,4.8,1.8,3
    6.1,3.0,4.9,1.8,3
    6.4,2.8,5.6,2.1,3
    7.2,3.0,5.8,1.6,3
    7.4,2.8,6.1,1.9,3
    7.9,3.8,6.4,2.0,3
    6.4,2.8,5.6,2.2,3
    6.3,2.8,5.1,1.5,3
    6.1,2.6,5.6,1.4,3
    7.7,3.0,6.1,2.3,3
    6.3,3.4,5.6,2.4,3
    6.4,3.1,5.5,1.8,3
    6.0,3.0,4.8,1.8,3
    6.9,3.1,5.4,2.1,3
    6.7,3.1,5.6,2.4,3
    6.9,3.1,5.1,2.3,3
    5.8,2.7,5.1,1.9,3
    6.8,3.2,5.9,2.3,3
    6.7,3.3,5.7,2.5,3
    6.7,3.0,5.2,2.3,3
    6.3,2.5,5.0,1.9,3
    6.5,3.0,5.2,2.0,3
    6.2,3.4,5.4,2.3,3
    5.9,3.0,5.1,1.8,3
    

    2. MATLAB程序

    function [COEFF,SCORE,latent,tsquared,explained,mu,data_PCA]=pca_demo()
    x=load('iris.data');
    [~,d]=size(x);
    k=d-1; %前k个主成分
    x=zscore(x(:,1:d-1));  %归一化数据
    [COEFF,SCORE,latent,tsquared,explained,mu]=pca(x);
    % 1)获取样本数据 X ,样本为行,特征为列。
    % 2)对样本数据中心化,得S(S = X的各列减去各列的均值)。
    % 3)求 S 的协方差矩阵 C = cov(S)
    % 4) 对协方差矩阵 C 进行特征分解 [P,Lambda] = eig(C);
    % 5)结束。
    % 1、输入参数 X 是一个 n 行 p 列的矩阵。每行代表一个样本观察数据,每列则代表一个属性,或特征。
    % 2、COEFF 就是所需要的特征向量组成的矩阵,是一个 p 行 p 列的矩阵,没列表示一个出成分向量,经常也称为(协方差矩阵的)特征向量。并且是按照对应特征值降序排列的。所以,如果只需要前 k 个主成分向量,可通过:COEFF(:,1:k) 来获得。
    % 3、SCORE 表示原数据在各主成分向量上的投影。但注意:是原数据经过中心化后在主成分向量上的投影。即通过:SCORE = x0*COEFF 求得。其中 x0 是中心平移后的 X(注意:是对维度进行中心平移,而非样本。),因此在重建时,就需要加上这个平均值了。
    % 4、latent 是一个列向量,表示特征值,并且按降序排列。
    % 5、tsquared Hotelling的每个观测值X的T平方统计量
    % 6、explained 由每个主成分解释的总方差的百分比
    % 7、mu 每个变量X的估计平均值
    % x= bsxfun(@minus,x,mean(x,1)); data_PCA=x*COEFF(:,1:k); latent1=100*latent/sum(latent);%将latent总和统一为100,便于观察贡献率 pareto(latent1);%调用matla画图 pareto仅绘制累积分布的前95%,因此y中的部分元素并未显示 xlabel('Principal Component'); ylabel('Variance Explained (%)'); % 图中的线表示的累积变量解释程度 print(gcf,'-dpng','Iris PCA.png'); iris_pac=data_PCA(:,1:2) ; save iris_pca iris_pac

    3. 结果

    iris_pca:前两个主成分

    -2.25698063306803	0.504015404227653
    -2.07945911889541	-0.653216393612590
    -2.36004408158421	-0.317413944570283
    -2.29650366000389	-0.573446612971233
    -2.38080158645275	0.672514410791076
    -2.06362347633724	1.51347826673567
    -2.43754533573242	0.0743137171331950
    -2.22638326740708	0.246787171742162
    -2.33413809644009	-1.09148977019584
    -2.18136796941948	-0.447131117450110
    -2.15626287481026	1.06702095645556
    -2.31960685513084	0.158057945820095
    -2.21665671559727	-0.706750478104682
    -2.63090249246321	-0.935149145374822
    -2.18497164997156	1.88366804891533
    -2.24394778052703	2.71328133141014
    -2.19539570001472	1.50869601039751
    -2.18286635818774	0.512587093716441
    -1.88775015418968	1.42633236069007
    -2.33213619695782	1.15416686250116
    -1.90816386828207	0.429027879924458
    -2.19728429051438	0.949277150423224
    -2.76490709741649	0.487882574439700
    -1.81433337754274	0.106394361814184
    -2.22077768737273	0.161644638073716
    -1.95048968523510	-0.605862870440206
    -2.04521166172712	0.265126114804279
    -2.16095425532709	0.550173363315497
    -2.13315967968331	0.335516397664229
    -2.26121491382610	-0.313827252316662
    -2.13739396044139	-0.482326258880086
    -1.82582143036022	0.443780130732953
    -2.59949431958629	1.82237008322707
    -2.42981076672382	2.17809479520796
    -2.18136796941948	-0.447131117450110
    -2.20373717203888	-0.183722323644913
    -2.03759040170113	0.682669420156327
    -2.18136796941948	-0.447131117450110
    -2.42781878392261	-0.879223932713649
    -2.16329994558551	0.291749566745466
    -2.27889273592867	0.466429134628597
    -1.86545776627869	-2.31991965918865
    -2.54929404704891	-0.452301129580194
    -1.95772074352968	0.495730895348582
    -2.12624969840005	1.16752080832811
    -2.06842816583668	-0.689607099127106
    -2.37330741591874	1.14679073709691
    -2.39018434748641	-0.361180775489047
    -2.21934619663183	1.02205856145225
    -2.19858869176329	0.0321302060908945
    1.10030752013391	0.860230593245533
    0.730035752246062	0.596636784545418
    1.23796221659453	0.612769614333371
    0.395980710562889	-1.75229858398514
    1.06901265623960	-0.211050862633647
    0.383174475987114	-0.589088965722193
    0.746215185580377	0.776098608766709
    -0.496201068006129	-1.84269556949638
    0.923129796737431	0.0302295549588077
    0.00495143780650871	-1.02596403732389
    -0.124281108093219	-2.64918765259090
    0.437265238506424	-0.0586846858581760
    0.549792126592992	-1.76666307900171
    0.714770518429262	-0.184815166484382
    -0.0371339806719297	-0.431350035919633
    0.872966018474250	0.508295314415273
    0.346844440799832	-0.189985178614466
    0.152880381053472	-0.788085297090142
    1.21124542423444	-1.62790202112846
    0.156417163578196	-1.29875232891050
    0.735791135537219	0.401126570248885
    0.470792483676532	-0.415217206131680
    1.22388807504403	-0.937773165086814
    0.627279600231826	-0.415419947028686
    0.698133985336190	-0.0632819273014206
    0.870620328215835	0.249871517845242
    1.25003445866275	-0.0823442389434431
    1.35370481019450	0.327722365822153
    0.659915359649250	-0.223597000167979
    -0.0471236447211597	-1.05368247816741
    0.121128417400412	-1.55837168956507
    0.0140710866007487	-1.56813894313840
    0.235222818975321	-0.773333046281646
    1.05316323317206	-0.634774729305402
    0.220677797156699	-0.279909968621073
    0.430341476713787	0.852281697154445
    1.04590946111265	0.520453696157683
    1.03241950881290	-1.38781716762055
    0.0668436673617666	-0.211910813930204
    0.274505447436587	-1.32537578085168
    0.271425764670620	-1.11570381243558
    0.621089830946741	0.0274506709978046
    0.328903506457842	-0.985598883763833
    -0.372380114621411	-2.01119457605980
    0.281999617970590	-0.851099454545845
    0.0887557702224096	-0.174324544331148
    0.223607676665854	-0.379214256409087
    0.571967341693057	-0.153206717308028
    -0.455486948803962	-1.53432438068788
    0.251402252309636	-0.593871222060355
    1.84150338645482	0.868786147264828
    1.14933941416981	-0.698984450845645
    2.19898270027627	0.552618780551384
    1.43388176486790	-0.0498435417617587
    1.86165398830779	0.290220535935809
    2.74500070081969	0.785799704159685
    0.357177895625210	-1.55488557249365
    2.29531637451915	0.408149356863061
    1.99505169024551	-0.721448439846371
    2.25998344407884	1.91502747107928
    1.36134878398531	0.691631011499905
    1.59372545693795	-0.426818952656741
    1.87796051113409	0.412949339203311
    1.24890257443547	-1.16349352357816
    1.45917315700813	-0.442664601834978
    1.58649439864337	0.674774813132046
    1.46636772102851	0.252347085727036
    2.42924030093571	2.54822056527013
    3.29809226641255	-0.00235343587272177
    1.24979406018816	-1.71184899071237
    2.03368323142868	0.904369044486726
    0.970663302005081	-0.569267277965818
    2.88838806680663	0.396463170625287
    1.32475563655861	-0.485135293486995
    1.69855040646181	1.01076227706927
    1.95119099025002	0.999984474306318
    1.16799162725452	-0.317831851008113
    1.01637609822602	0.0653241212065782
    1.78004554289349	-0.192627479858818
    1.85855159177699	0.553527164026207
    2.42736549094542	0.245830911619345
    2.30834922706014	2.61741528404554
    1.85415981777379	-0.184055790370030
    1.10756129219332	-0.294997832217552
    1.19347091639304	-0.814439294423699
    2.79159729280499	0.841927657717863
    1.57487925633390	1.06889360300461
    1.34254676764379	0.420846092290459
    0.920349720485088	0.0191661621187343
    1.84736314547313	0.670177571688802
    2.00942543830962	0.608358978317639
    1.89676252747561	0.683734258412757
    1.14933941416981	-0.698984450845645
    2.03648602144585	0.861797777652503
    1.99500750598298	1.04504903502442
    1.86427657131500	0.381543630923962
    1.55328823048458	-0.902290843047121
    1.51576710303099	0.265903772450991
    1.37179554779330	1.01296839034343
    0.956095566421630	-0.0222095406309480
    

    累计贡献率

    可见:前两个主成分已经占了95%的贡献程度。这两个主成分可以近似表示整个数据。

    4. pca_data.m

    其中normlization.m见MATLAB实例:聚类初始化方法与数据归一化方法

    function data=pca_data(data, choose)
    % PCA降维,保留90%的特征信息 
    data = normlization(data, choose); %归一化
    score = 0.90; %保留90%的特征信息
    [num,dim] = size(data);
    xbar = mean(data,1);
    means = bsxfun(@minus, data, xbar);
    cov = means'*means/num;
    [V,D] = eig(cov);
    eigval = diag(D);
    [~,idx] = sort(eigval,'descend');
    eigval = eigval(idx);
    V = V(idx,:);
    p = 0;
    for i=1:dim
       perc = sum(eigval(1:i))/sum(eigval);
       if perc > score
           p = i;
           break;       
       end 
    end
    E = V(1:p,:);
    data= means*E';

    参考:

    Junhao Hua. Distributed Variational Bayesian Algorithms. Github, 2017.

    MATLAB实例:PCA(主成成分分析)详解

  • 相关阅读:
    linux常用脚本
    shell学习笔记
    Linux常用命令List
    Centos7部署Zabbix
    centos7安装nagios步骤
    nagios报错HTTP WARNING: HTTP/1.1 403 Forbidden解决方法
    U盘安装CentOS7
    Thread线程控制之sleep、join、setDaemon方法的用处
    EfficientDet框架详解 | 目前最高最快最小模型,可扩缩且高效的目标检测(附源码下载)
    ubuntu18.04 安装多版本cuda ,原来版本为9.0,在新增8.0
  • 原文地址:https://www.cnblogs.com/kailugaji/p/11594507.html
Copyright © 2011-2022 走看看