Deep Learning with TensorFlow IBM Cognitive Class ML0120EN Module 5 - Autoencoders
使用DBN识别手写体 传统的多层感知机或者神经网络的一个问题: 反向传播可能总是导致局部最小值。 当误差表面(error surface)包含了多个凹槽,当你做梯度下降时,你找到的并不是最深的凹槽。 下面你将会看到DBN是怎么解决这个问题的。
深度置信网络
深度置信网络可以通过额外的预训练规程解决局部最小值的问题。 预训练在反向传播之前做完,这样可以使错误率离最优的解不是那么远,也就是我们在最优解的附近。再通过反向传播慢慢地降低错误率。 深度置信网络主要分成两部分。第一部分是多层玻尔兹曼感知机,用于预训练我们的网络。第二部分是前馈反向传播网络,这可以使RBM堆叠的网络更加精细化。
1. 加载必要的深度置信网络库
# urllib is used to download the utils file from deeplearning.net
import urllib.request
response = urllib.request.urlopen('http://deeplearning.net/tutorial/code/utils.py')
content = response.read().decode('utf-8')
target = open('utils.py', 'w')
target.write(content)
target.close()
# Import the math function for calculations
import math
# Tensorflow library. Used to implement machine learning models
import tensorflow as tf
# Numpy contains helpful functions for efficient mathematical calculations
import numpy as np
# Image library for image manipulation
from PIL import Image
# import Image
# Utils file
from utils import tile_raster_images
2. 构建RBM层
RBM的细节参考【https://blog.csdn.net/sinat_28371057/article/details/115795086】
为了在Tensorflow中应用DBN, 下面创建一个RBM的类
class RBM(object):
def __init__(self, input_size, output_size):
# Defining the hyperparameters
self._input_size = input_size # Size of input
self._output_size = output_size # Size of output
self.epochs = 5 # Amount of training iterations
self.learning_rate = 1.0 # The step used in gradient descent
self.batchsize = 100 # The size of how much data will be used for training per sub iteration
# Initializing weights and biases as matrices full of zeroes
self.w = np.zeros([input_size, output_size], np.float32) # Creates and initializes the weights with 0
self.hb = np.zeros([output_size], np.float32) # Creates and initializes the hidden biases with 0
self.vb = np.zeros([input_size], np.float32) # Creates and initializes the visible biases with 0
# Fits the result from the weighted visible layer plus the bias into a sigmoid curve
def prob_h_given_v(self, visible, w, hb):
# Sigmoid
return tf.nn.sigmoid(tf.matmul(visible, w) + hb)
# Fits the result from the weighted hidden layer plus the bias into a sigmoid curve
def prob_v_given_h(self, hidden, w, vb):
return tf.nn.sigmoid(tf.matmul(hidden, tf.transpose(w)) + vb)
# Generate the sample probability
def sample_prob(self, probs):
return tf.nn.relu(tf.sign(probs - tf.random_uniform(tf.shape(probs))))
# Training method for the model
def train(self, X):
# Create the placeholders for our parameters
_w = tf.placeholder("float", [self._input_size, self._output_size])
_hb = tf.placeholder("float", [self._output_size])
_vb = tf.placeholder("float", [self._input_size])
prv_w = np.zeros([self._input_size, self._output_size],
np.float32) # Creates and initializes the weights with 0
prv_hb = np.zeros([self._output_size], np.float32) # Creates and initializes the hidden biases with 0
prv_vb = np.zeros([self._input_size], np.float32) # Creates and initializes the visible biases with 0
cur_w = np.zeros([self._input_size, self._output_size], np.float32)
cur_hb = np.zeros([self._output_size], np.float32)
cur_vb = np.zeros([self._input_size], np.float32