中文情感识别 4

混合使用 LSTM 和 CNN
卷积神经网络对于稀疏结构的数据处理非常有效。 IMDB 影评数据确实在评价的单
词序列中具有一维稀疏空间结构, CNN 能够挑选出不良情绪的不变特征。通过 CNN 学习后的空间特征, 可以被 LSTM 层学习为序列。 在词嵌入层之后, 可以通过添加一维CNN和最大池化层,将合并的特征提供给 LSTM。在卷积层使用具有32 个特征的滤波器,并将其步长设置为 3,池化层使用步长为 2 的标准步长将特征图大小减半。
- 各个层的参数问题
序列分类:IMDB 影评分类 LSTM
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
from keras.datasets import imdb
import numpy as np
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers.convolutional import Conv1D, MaxPooling1D
seed = 7
top_words = 5000
max_words = 500
out_dimension = 32
batch_size = 128
epochs = 2
dropout_rate = 0.2
def build_model():
model = Sequential()
model.add(Embedding(top_words, out_dimension, input_length=max_words))
model.add(Conv1D(filters=32, kernel_size=3, padding='same',activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# 输出模型的概要信息
return model
# 导入数据
(x_train, y_train), (x_validation, y_validation) = imdb.load_data(num_words=top_words)
x_train = sequence.pad_sequences(x_train, maxlen=max_words)
x_validation = sequence.pad_sequences(x_validation, maxlen=max_words)
# 生成模型并训练模型
model = build_model()
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2)
scores = model.evaluate(x_validation, y_validation, verbose=2)
print('Accuracy: %.2f%%' % (scores[1] * 100))
Model: "sequential_7"
Layer (type) Output Shape Param #
embedding_4 (Embedding) (None, 500, 32) 160000
conv1d_1 (Conv1D) (None, 500, 32) 3104
max_pooling1d_1 (MaxPooling1 (None, 250, 32) 0
lstm_7 (LSTM) (None, 100) 53200
dense_4 (Dense) (None, 1) 101
Total params: 216,405
Trainable params: 216,405
Non-trainable params: 0
M:Anaconda3libsite-packages ensorflow_corepythonframeworkindexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/2
- 285s - loss: 0.5007 - accuracy: 0.7304
Epoch 2/2
- 285s - loss: 0.2494 - accuracy: 0.9010
Accuracy: 87.25%
layer | Accuracy |
LSTM | 85.58% |
CNN+LSTM | 87.25% |