zoukankan      html  css  js  c++  java
  • Deep Learning Tutorial (翻译) 之 RBM(下)

    英文原文请参考http://www.deeplearning.net/tutorial/rbm.html

    RBM代码实现

    我们构造一个RBM类,其参数(主要是W,hbias,vbias,theano_rng)可以通过构造器初始化或通过形参传递。这样处理有助于将RBM用于深度网络的一个构成块,这样参数W和b可以与相应的MLP的sigmoidal层参数共享。代码如下:

    class RBM(object):
        def __init__(self,
                     input=None,
                     n_visible=784,
                     n_hidden=500,
                     W=None,
                     hbias=None,
                     vbias=None,
                     numpy_rng=None,
                     theano_rng=None
                     ):
            self.n_visible = n_visible
            self.n_hidden = n_hidden
            if numpy_rng is None:
                numpy_rng = numpy.random.RandomState(1234)
            if theano_rng is None:
                theano_rng = RandomStreams(numpy_rng.randint(2**30))
            if W is None:
                initial_W = numpy.asarray(
                    numpy_rng.uniform(
                        low=-4 * numpy.sqrt(6. / (n_visible + n_hidden)),
                        high=4 * numpy.sqrt(6. /(n_visible + n_hidden)),
                        size=(n_visible, n_hidden)
                    ),
                    dtype= theano.config.floatX
                )
                W = theano.shared(value=initial_W, name='W',borrow=True)
            if hbias is None:
                hbias = theano.shared(
                    value=numpy.zeros(
                        n_hidden,
                        dtype=theano.config.floatX
                    ),
                    name='hbias',
                    borrow=True
                )
            if vbias is None:
                vbias = theano.shared(
                    value=numpy.zeros(
                        n_visible,
                        dtype=theano.config.floatX
                    ),
                    name='vbias',
                    borrow=True
                )
            # initialize input layer for standalone RBM or layer0 of DBN
            self.input = input
            if not input:
                self.input = T.matrix('input')
    
            self.W = W
            self.hbias = hbias
            self.vbias = vbias
            self.theano_rng = theano_rng
    
            self.params = [self.W, self.hbias, self.vbias]

     下一步定义构造符号图的函数根据公式7和8

    代码如下:

    def propup(self, vis):
            '''
            这个函数从可见层向隐藏层激活进行传播
            注意到这里也返回了pre_sigmoid_activation。这个符号变量在需要更稳定的计算图时可能用到
            '''
            pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias
            return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
    
        def propdown(self, hid):
            pre_sigmoid_activation = T.dot(hid, self.W) + self.vbias
            return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
    
        def sample_h_given_v(self, v0_sample):
            #这个函数给定可见层预测隐藏层
            #首先根据给定可见层样本计算隐藏层的activation
            pre_sigmoid_h1, h1_mean = self.propup(v0_sample)
            #获得隐藏层样本通过上面的activation
            h1_sample = self.theano_rng.binomial(size=h1_mean.shape,
                                                 n=1, p=h1_mean,
                                                 dtype=theano.config.floatX)
            return [pre_sigmoid_h1, h1_mean, h1_sample]
    
        def sample_v_given_h(self, h0_sample):
            pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)
            v1_sample = self.theano_rng.binomial(size=v1_mean.shape,
                                                 n=1,p=v1_mean,
                                                 dtype=theano.config.floatX)
            return [pre_sigmoid_v1, v1_mean, v1_sample]

    我们可以使用这些函数为Gibbs采用步骤定义符号图。定义两个函数:

    • gibbs_vhv执行一步Gibbs采样,从可见层开始,我们将看到,这步对从RBM采用很有用
    • gibbs_hvh执行一步Gibbs采样,从隐藏层开始,对执行CD和PCD更新有用

    代码如下:

    def gibbs_hvh(self, h0_sample):
            pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample)
            pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample)
            return [pre_sigmoid_v1, v1_mean, v1_sample,
                    pre_sigmoid_h1,h1_mean, h1_sample]
    
        def gibbs_vhv(self, v0_sample):
            pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample)
            pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample)
            return [pre_sigmoid_h1, h1_mean, h1_sample,
                    pre_sigmoid_v1, v1_mean, v1_sample]
    

    注意到这里我们也返回了pre-sigmoid activation。理解这个我们需要了解Theano是怎样工作的,whenever你编译一个Theano函数,作为input的计算图为了加速和稳定获得优化,这是通过改变其他子图的几个部分实现的。(接下来的解释都不懂就不翻译了)Therefore the easiest and more efficient way is to get also the pre-sigmoid activation as an output of scan, and apply both the log and sigmoid outside scan such that Theano can catch and optimize the expression.

    这个类也有个函数计算自由能,计算参数的梯度时用到。我们增加get_cost_updates方法,生成符号梯度为CD-k或PCD-k更新,代码如下:

    def free_energy(self, v_sample):
            wx_b = T.dot(v_sample, self.W) + self.hbias
            vbias_term = T.dot(v_sample, self.vbias)
            hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1)
            return -hidden_term -vbias_term
    
    def get_cost_updates(self, lr=0.1, persistent=None, k=1):
            '''
            这个函数用来实现一步CD-k或PCD-k
            :param lr: 学习率
            :param persistent: For PCD,共享变量包含Gibbs链的old state。
             size为(batch size, 隐藏单元个数)
            :param k: Gibbs步数
            :return:返回代价和updates,updates包括weights和biases,
             同时也有shared variable的更新,用于保存持久链,如果是PCD
            '''
            # 计算positive phase
            pre_sigmoid_ph, ph_mean, ph_sample = self.sample_h_given_v(self.input)
            # 决定如何初始化持久链
            # 对CD,使用新生成的隐藏层样本
            # 对PCD,用以前的链状态初始化
            if persistent is None:
                chain_start = ph_sample
            else:
                chain_start = persistent
    
            # 执行negative phase
            # 为了实现CD/PCD我们需要scan实现一步gibbs的函数k次
            # the scan 将返回整个Gibbs链
            (
                [
                    pre_sigmoid_nvs,
                    nv_means,
                    nv_samples,
                    pre_sigmoid_nhs,
                    nh_means,
                    nh_samples
                ],
                updates
            ) = theano.scan(
                self.gibbs_hvh,
                # None 是占位符place holders
                outputs_info=[None, None, None, None, None, chain_start],
                n_steps=k,
                name="gibbs_hvh"
            )
            # 如果我们直接使用T.grad,函数可能遍历Gibbs链来获得梯度,这不是我们想要的,
            # 因为会混淆,因此我们需要表明chain_end是一个常量by consider_constant
            chain_end = nv_samples[-1]
            cost = T.mean(self.free_energy(self.input)) - T.mean(self.free_energy(chain_end))
            gparams = T.grad(cost, self.params, consider_constant=[chain_end])
    
            for gparam, param in zip(gparams, self.params):
                updates[param] = param - gparam * T.cast(lr, dtype=theano.config.floatX)
            if persistent:
                updates[persistent] = nh_samples[-1]
                # pseudo-likelihood is a better proxy for PCD
                monitoring_cost = self.get_pseudo_likelihood_cost(updates)
            else:
                # reconstruction cross-entropy is a better proxy for CD
                monitoring_cost = self.get_reconstruction_cost(updates, pre_sigmoid_nvs[-1])
    
            return monitoring_cost, updates

     跟踪进展

    RBMs很难训练,因为partition函数Z,我们在训练中不能估计log-likelihood,我们没有直接有用的指标来选择最优的超参数。

    下面有几个options

    观察负样本

    负样本在训练中可以可视化,随着训练进行,我们知道模型越来越接近于真实的分布。负样本应该看起来像是训练集的样本,显然不好的参数应该丢弃。

    Filters可视化观察

    通过模型学习到的filters可以被可视化。

    由于网站关闭,翻译不能进行,sorry。

  • 相关阅读:
     随机选择数据库记录的方法
    交叉查询
    Delphi编辑器颜色设置
    Delphi Dll中多线程无法使用Synchronize同步的解决方法(转)
    Delphi FTP例子源码
    DELPHI之备忘四
    界面美化代码
    使Form响应滚轮事件
    配色卡
    Delphi http传输备忘
  • 原文地址:https://www.cnblogs.com/liwei33/p/5646342.html
Copyright © 2011-2022 走看看