一、Fisher算法
二、蠓的分类问题:
两种蠓Af和Apf已由生物学家根据它们的触角和翼长加以区分(Af是能传播花粉的益虫,Apf是会传播疾病的害虫),两个矩阵中分别给出了6只Apf 和9只Af蠓的触角长(对应于矩阵的第1列)和翼长(对应于矩阵的第2列)的数据(See next slide)。根据触角长和翼长这两个特征来识别一个样本是Af还是Apf是重要的。
(1)试给出该问题的Fisher分类器;
(2)有三个待识别的模式样本,它们分别是(1.24,1.80),(1.28,1.84),( 1.40,2.04),试问这三个样本属于哪一种蠓。
数据集:
APF = np.array([
[1.14,1.78],[1.18,1.96],
[1.20,1.86],[1.26,2.00],
[1.30,2.00],[1.28,1.96]
])
AF = np.array([
[1.24,1.72],[1.36,1.74],
[1.38,1.64],[1.38,1.82],
[1.38,1.90],[1.40,1.70],
[1.48,1.82],[1.54,2.08],
[1.56,1.78]
])
三、代码实现:
Fisher算法关键在于求出权向量W_和阈值W*,然后求出待测数据的映射y_test,最后与W*阈值作比较。
import numpy as np
APF = np.array([
[1.14,1.78],[1.18,1.96],
[1.20,1.86],[1.26,2.00],
[1.30,2.00],[1.28,1.96]
])
AF = np.array([
[1.24,1.72],[1.36,1.74],
[1.38,1.64],[1.38,1.82],
[1.38,1.90],[1.40,1.70],
[1.48,1.82],[1.54,2.08],
[1.56,1.78]
])
#获取样本均值
def getAvg(x):
return np.mean(x, axis=0)
#求两类样本类内离散度矩阵Si
def getSi(x, x_mean):
x_mean = x_mean.reshape(x.shape[1],1)
Si = np.zeros((x.shape[1],x.shape[1]))
for xi in x:
temp_xi = xi.copy().reshape(x.shape[1],1)
temp = (temp_xi-x_mean)
Si = Si + np.dot(temp, temp.T)
return Si
# 求权向量W_
def getW(x1_mean,x2_mean,Sw):
return np.dot(np.linalg.inv(Sw),(x1_mean-x2_mean))
# 获取分类阈值w0和权向量W_
def get_w0(x1, x2):
x1_mean = getAvg(x1)
x2_mean = getAvg(x2)
S1 = getSi(APF, x1_mean)
S2 = getSi(AF, x2_mean)
Sw = S1+S2
W_ = getW(x1_mean,x2_mean,Sw)
#获取投影点
y1 = np.dot(x1, W_)
y2 = np.dot(x2, W_)
#求各类样本均值yi_mean
y1_mean = np.mean(y1)
y2_mean = np.mean(y2)
#选取分类阈值w0
w0 = (y1_mean + y2_mean) / 2
return w0, W_
def Fisher(x1, x1_label, x2, x2_label, x_test):
w0, W_ = get_w0(x1,x2)
y_test = np.dot(x_test, W_)
if y_test > w0:
print('测试样本属于', x1_label)
elif y_test <w0:
print('测试样本属于',x2_label)
else:
print('测试样本可能属于%s,也可嫩属于%s'%x1_label%x2_label)
x_tests = np.array([
[1.24,1.80],[1.28,1.84],[1.40,2.04]
])
i = 1
for x_test in x_tests:
print('第%d个'%i,end='')
i += 1
Fisher(APF,'蠓APF',AF,'蠓AF',x_test)
预测结果如下:
第1个测试样本属于 蠓APF
第2个测试样本属于 蠓APF
第3个测试样本属于 蠓APF