  • 情感分析:基于卷积神经网络


    Sentiment Analysis: Using Convolutional Neural Networks

    探讨了如何用二维卷积神经网络来处理二维图像数据。在以往的语言模型和文本分类任务中,把文本数据看作一个一维的时间序列,自然地,使用递归神经网络来处理这些数据。实际上,也可以将文本看作一维图像,这样就可以使用一维卷积神经网络来捕捉相邻单词之间的关联。如中所述.. _fig_nlp-map-sa-cnn:本节描述了将卷积神经网络应用于情绪分析的突破性方法:textCNN[Kim,2014]。

     Fig. 1. This section feeds pretrained GloVe to a CNN-based architecture for sentiment analysis.


    from d2l import mxnet as d2l

    from mxnet import gluon, init, np, npx

    from mxnet.gluon import nn


    batch_size = 64

    train_iter, test_iter, vocab = d2l.load_data_imdb(batch_size)

    1. One-Dimensional Convolutional Layer



    Fig. 2 . One-dimensional cross-correlation operation. The shaded parts are the first output element as well as the input and kernel array elements used in its calculation: 0×1+1×2=20×1+1×2=2。


    def corr1d(X, K):

    w = K.shape[0]

    Y = np.zeros((X.shape[0] - w + 1))

    for i in range(Y.shape[0]):

    Y[i] = (X[i: i + w] * K).sum()

    return Y


    X, K = np.array([0, 1, 2, 3, 4, 5, 6]), np.array([1, 2])

    corr1d(X, K)

    array([ 2., 5., 8., 11., 14., 17.])


     Fig. 3 . One-dimensional cross-correlation operation with three input channels. The shaded parts are the first output element as well as the input and kernel array elements used in its calculation: 0×1+1×2+1×3+2×4+2×(−1)+3×(−3)=20×1+1×2+1×3+2×4+2×(−1)+3×(−3)=2。


    def corr1d_multi_in(X, K):

    # First, we traverse along the 0th dimension (channel dimension) of X and

    # K. Then, we add them together by using * to turn the result list into a

    # positional argument of the add_n function

    return sum(corr1d(x, k) for x, k in zip(X, K))

    X = np.array([[0, 1, 2, 3, 4, 5, 6],

    [1, 2, 3, 4, 5, 6, 7],

    [2, 3, 4, 5, 6, 7, 8]])

    K = np.array([[1, 2], [3, 4], [-1, -3]])

    corr1d_multi_in(X, K)

    array([ 2., 8., 14., 20., 26., 32.])


     Fig. 4. Two-dimensional cross-correlation operation with a single input channel. The highlighted parts are the first output element and the input and kernel array elements used in its calculation: 2×(−1)+3×(−3)+1×3+2×4+0×1+1×2=22×(−1)+3×(−3)+1×3+2×4+0×1+1×2=2。


    2. Max-Over-Time Pooling Layer

    有一个一维池化层。TextCNN中使用的max over time pooling层实际上对应于一维全局最大池层。假设输入包含多个通道,并且每个通道由不同时间步上的值组成,则每个通道的输出将是通道中所有时间步的最大值。因此,max over time pooling层的输入在每个通道上可以有不同的时间步长。

    为了提高计算性能,通常将不同长度的时序实例组合成一个小批量,并在较短的实例末尾添加特殊字符(如0),使批中每个定时示例的长度一致。自然,添加的特殊字符没有内在意义。因为max over time pooling层的主要目的是捕获最重要的计时特性,通常允许模型不受手动添加字符的影响。

    3. The TextCNN Model






     Fig. 5. TextCNN design.

    图5给出了一个示例来说明textCNN。这里的输入是一个有11个单词的句子,每个单词由一个6维的单词向量表示。因此,输入序列具有11个和6个输入信道的宽度。假设存在两个宽度分别为2和4的一维卷积核,以及4个和5个输出通道。因此,经过一维卷积计算,四个输出通道的宽度为11−2+1=10,而其五个通道的宽度是11−4+1=8。即使每个通道的宽度不同,仍然可以对每个通道执行max over time pooling,并将9个通道的池输出连接成一个9维向量。最后,使用一个完全连通的层将9维向量转换为二维输出:积极情绪和消极情绪预测。


    class TextCNN(nn.Block):

    def __init__(self, vocab_size, embed_size, kernel_sizes, num_channels,


    super(TextCNN, self).__init__(**kwargs)

    self.embedding = nn.Embedding(vocab_size, embed_size)

    # The embedding layer does not participate in training

    self.constant_embedding = nn.Embedding(vocab_size, embed_size)

    self.dropout = nn.Dropout(0.5)

    self.decoder = nn.Dense(2)

    # The max-over-time pooling layer has no weight, so it can share an

    # instance

    self.pool = nn.GlobalMaxPool1D()

    # Create multiple one-dimensional convolutional layers

    self.convs = nn.Sequential()

    for c, k in zip(num_channels, kernel_sizes):

    self.convs.add(nn.Conv1D(c, k, activation='relu'))

    def forward(self, inputs):

    # Concatenate the output of two embedding layers with shape of

    # (batch size, number of words, word vector dimension) by word vector

    embeddings = np.concatenate((

    self.embedding(inputs), self.constant_embedding(inputs)), axis=2)

    # According to the input format required by Conv1D, the word vector

    # dimension, that is, the channel dimension of the one-dimensional

    # convolutional layer, is transformed into the previous dimension

    embeddings = embeddings.transpose(0, 2, 1)

    # For each one-dimensional convolutional layer, after max-over-time

    # pooling, an ndarray with the shape of (batch size, channel size, 1)

    # can be obtained. Use the flatten function to remove the last

    # dimension and then concatenate on the channel dimension

    encoding = np.concatenate([

    np.squeeze(self.pool(conv(embeddings)), axis=-1)

    for conv in self.convs], axis=1)

    # After applying the dropout method, use a fully connected layer to

    # obtain the output

    outputs = self.decoder(self.dropout(encoding))

    return outputs


    embed_size, kernel_sizes, nums_channels = 100, [3, 4, 5], [100, 100, 100]

    ctx = d2l.try_all_gpus()

    net = TextCNN(len(vocab), embed_size, kernel_sizes, nums_channels)

    net.initialize(init.Xavier(), ctx=ctx)

    3.1. Load Pre-trained Word Vectors


    glove_embedding = d2l.TokenEmbedding('glove.6b.100d')

    embeds = glove_embedding[vocab.idx_to_token]



    net.constant_embedding.collect_params().setattr('grad_req', 'null')

    3.2. Train and Evaluate the Model


    lr, num_epochs = 0.001, 5

    trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': lr})

    loss = gluon.loss.SoftmaxCrossEntropyLoss()

    d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, ctx)

    loss 0.094, train acc 0.968, test acc 0.866

    3834.5 examples/sec on [gpu(0), gpu(1)]


    d2l.predict_sentiment(net, vocab, 'this movie is so great')


    d2l.predict_sentiment(net, vocab, 'this movie is so bad')


    4. Summary

    · We can use one-dimensional convolution to process and analyze timing data.

    · A one-dimensional cross-correlation operation with multiple input channels can be regarded as a two-dimensional cross-correlation operation with a single input channel.

    · The input of the max-over-time pooling layer can have different numbers of timesteps on each channel.

    · TextCNN mainly uses a one-dimensional convolutional layer and max-over-time pooling layer.

