从概率论角度,判别分析是根据所给样本数据,对所给的未分类数据进行分类。
如下表,已知有t个样本数据,每个数据关于n个量化特征有一个值,又已知该样本数据的分类,据此,求s个未分类数据的分类情况class。
Matlab 的统计工具箱提供了判别函数
[class,err] = classify(sample,training,group, type)
其中,err 给出了分类误判率的估计值,type为分类方法,缺省值为'linear',即线性分类,type 还可取值'quadratic', 'mahalanobis'( mahalanobis 距离)。
eg:
程序:
training=[13.54,14.36,87.46,566.3,0.09779 13.08,15.71,85.63,520,0.1075 9.504,12.44,60.34,273.9,0.1024 17.99,10.38,122.8,1001,0.1184 20.57,17.77,132.9,1326,0.08474 19.69,21.25,130,1203,0.1096 11.42,20.38,77.58,386.1,0.1425 20.29,14.34,135.1,1297,0.1003]; sample=[16.6,28.08,108.3,858.1,0.08455 20.6,29.33,140.1,1265,0.1178 7.76,24.54,47.92,181,0.05263]; group=[zeros(3,1);ones(5,1)];%0表示良性肿瘤,1表示恶性肿瘤 [class,err]=classify(sample,training,group)
运行结果:
class = 0 0 0 err = 0
所以,三个样本的分类都为良性肿瘤。