英文原文请参考http://www.deeplearning.net/tutorial/rbm.html
RBM代码实现
我们构造一个RBM类,其参数(主要是W,hbias,vbias,theano_rng)可以通过构造器初始化或通过形参传递。这样处理有助于将RBM用于深度网络的一个构成块,这样参数W和b可以与相应的MLP的sigmoidal层参数共享。代码如下:
class RBM(object): def __init__(self, input=None, n_visible=784, n_hidden=500, W=None, hbias=None, vbias=None, numpy_rng=None, theano_rng=None ): self.n_visible = n_visible self.n_hidden = n_hidden if numpy_rng is None: numpy_rng = numpy.random.RandomState(1234) if theano_rng is None: theano_rng = RandomStreams(numpy_rng.randint(2**30)) if W is None: initial_W = numpy.asarray( numpy_rng.uniform( low=-4 * numpy.sqrt(6. / (n_visible + n_hidden)), high=4 * numpy.sqrt(6. /(n_visible + n_hidden)), size=(n_visible, n_hidden) ), dtype= theano.config.floatX ) W = theano.shared(value=initial_W, name='W',borrow=True) if hbias is None: hbias = theano.shared( value=numpy.zeros( n_hidden, dtype=theano.config.floatX ), name='hbias', borrow=True ) if vbias is None: vbias = theano.shared( value=numpy.zeros( n_visible, dtype=theano.config.floatX ), name='vbias', borrow=True ) # initialize input layer for standalone RBM or layer0 of DBN self.input = input if not input: self.input = T.matrix('input') self.W = W self.hbias = hbias self.vbias = vbias self.theano_rng = theano_rng self.params = [self.W, self.hbias, self.vbias]
下一步定义构造符号图的函数根据公式7和8
代码如下:
def propup(self, vis): ''' 这个函数从可见层向隐藏层激活进行传播 注意到这里也返回了pre_sigmoid_activation。这个符号变量在需要更稳定的计算图时可能用到 ''' pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)] def propdown(self, hid): pre_sigmoid_activation = T.dot(hid, self.W) + self.vbias return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)] def sample_h_given_v(self, v0_sample): #这个函数给定可见层预测隐藏层 #首先根据给定可见层样本计算隐藏层的activation pre_sigmoid_h1, h1_mean = self.propup(v0_sample) #获得隐藏层样本通过上面的activation h1_sample = self.theano_rng.binomial(size=h1_mean.shape, n=1, p=h1_mean, dtype=theano.config.floatX) return [pre_sigmoid_h1, h1_mean, h1_sample] def sample_v_given_h(self, h0_sample): pre_sigmoid_v1, v1_mean = self.propdown(h0_sample) v1_sample = self.theano_rng.binomial(size=v1_mean.shape, n=1,p=v1_mean, dtype=theano.config.floatX) return [pre_sigmoid_v1, v1_mean, v1_sample]
我们可以使用这些函数为Gibbs采用步骤定义符号图。定义两个函数:
- gibbs_vhv执行一步Gibbs采样,从可见层开始,我们将看到,这步对从RBM采用很有用
- gibbs_hvh执行一步Gibbs采样,从隐藏层开始,对执行CD和PCD更新有用
代码如下:
def gibbs_hvh(self, h0_sample): pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample) pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample) return [pre_sigmoid_v1, v1_mean, v1_sample, pre_sigmoid_h1,h1_mean, h1_sample] def gibbs_vhv(self, v0_sample): pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample) pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample) return [pre_sigmoid_h1, h1_mean, h1_sample, pre_sigmoid_v1, v1_mean, v1_sample]
注意到这里我们也返回了pre-sigmoid activation。理解这个我们需要了解Theano是怎样工作的,whenever你编译一个Theano函数,作为input的计算图为了加速和稳定获得优化,这是通过改变其他子图的几个部分实现的。(接下来的解释都不懂就不翻译了)Therefore the easiest and more efficient way is to get also the pre-sigmoid activation as an output of scan, and apply both the log and sigmoid outside scan such that Theano can catch and optimize the expression.
这个类也有个函数计算自由能,计算参数的梯度时用到。我们增加get_cost_updates方法,生成符号梯度为CD-k或PCD-k更新,代码如下:
def free_energy(self, v_sample): wx_b = T.dot(v_sample, self.W) + self.hbias vbias_term = T.dot(v_sample, self.vbias) hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1) return -hidden_term -vbias_term def get_cost_updates(self, lr=0.1, persistent=None, k=1): ''' 这个函数用来实现一步CD-k或PCD-k :param lr: 学习率 :param persistent: For PCD,共享变量包含Gibbs链的old state。 size为(batch size, 隐藏单元个数) :param k: Gibbs步数 :return:返回代价和updates,updates包括weights和biases, 同时也有shared variable的更新,用于保存持久链,如果是PCD ''' # 计算positive phase pre_sigmoid_ph, ph_mean, ph_sample = self.sample_h_given_v(self.input) # 决定如何初始化持久链 # 对CD,使用新生成的隐藏层样本 # 对PCD,用以前的链状态初始化 if persistent is None: chain_start = ph_sample else: chain_start = persistent # 执行negative phase # 为了实现CD/PCD我们需要scan实现一步gibbs的函数k次 # the scan 将返回整个Gibbs链 ( [ pre_sigmoid_nvs, nv_means, nv_samples, pre_sigmoid_nhs, nh_means, nh_samples ], updates ) = theano.scan( self.gibbs_hvh, # None 是占位符place holders outputs_info=[None, None, None, None, None, chain_start], n_steps=k, name="gibbs_hvh" ) # 如果我们直接使用T.grad,函数可能遍历Gibbs链来获得梯度,这不是我们想要的, # 因为会混淆,因此我们需要表明chain_end是一个常量by consider_constant chain_end = nv_samples[-1] cost = T.mean(self.free_energy(self.input)) - T.mean(self.free_energy(chain_end)) gparams = T.grad(cost, self.params, consider_constant=[chain_end]) for gparam, param in zip(gparams, self.params): updates[param] = param - gparam * T.cast(lr, dtype=theano.config.floatX) if persistent: updates[persistent] = nh_samples[-1] # pseudo-likelihood is a better proxy for PCD monitoring_cost = self.get_pseudo_likelihood_cost(updates) else: # reconstruction cross-entropy is a better proxy for CD monitoring_cost = self.get_reconstruction_cost(updates, pre_sigmoid_nvs[-1]) return monitoring_cost, updates
跟踪进展
RBMs很难训练,因为partition函数Z,我们在训练中不能估计log-likelihood,我们没有直接有用的指标来选择最优的超参数。
下面有几个options
观察负样本
负样本在训练中可以可视化,随着训练进行,我们知道模型越来越接近于真实的分布。负样本应该看起来像是训练集的样本,显然不好的参数应该丢弃。
Filters可视化观察
通过模型学习到的filters可以被可视化。
由于网站关闭,翻译不能进行,sorry。