zoukankan      html  css  js  c++  java
  • 中文情感识别 3

    中文情感识别 3

    序列分类:IMDB 影评分类

    序列分类是通过输入的空间或时间序列,预测序列类别的任务。在序列分类中,最
    大的问题是序列的长度可以变化,并且输入符号由非常多的词汇组成,而且可能需要模型来学习输入序列中的上下文或符号之间的依赖关系。本章将介绍如何利用 LSTM来解决序列分类问题

    问题描述

    采用 IMDB 数据集来对序列分类问题进行分析,通过LSTM来分析影评中对电影的评价。

    简单 LSTM

    词嵌入层 + LSTM + 输出层

    关键问题

    代码

    '''
    序列分类:IMDB 影评分类 LSTM
    '''
    import os
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
    from keras.datasets import imdb
    import numpy as np
    from keras.preprocessing import sequence
    from keras.models import Sequential
    from keras.layers.embeddings import Embedding
    from keras.layers import LSTM
    from keras.layers import Dense
    
    seed = 7
    top_words = 5000
    max_words = 500
    out_dimension = 32
    batch_size = 128
    epochs = 2
    
    def build_model():
        model = Sequential()
        model.add(Embedding(top_words, out_dimension, input_length=max_words))
        model.add(LSTM(units=100))
        model.add(Dense(units=1, activation='sigmoid'))
        model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
        # 输出模型的概要信息
        model.summary()
        return model
    
    np.random.seed(seed=seed)
    # 导入数据
    (x_train, y_train), (x_validation, y_validation) = imdb.load_data(num_words=top_words)
    x_train = sequence.pad_sequences(x_train, maxlen=max_words)
    x_train = sequence.pad_sequences(x_train, maxlen=max_words)
    x_validation = sequence.pad_sequences(x_validation, maxlen=max_words)
    
    
    # 生成模型并训练模型
    model = build_model()
    
    model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2)
    
    scores = model.evaluate(x_validation, y_validation, verbose=2)
    print('Accuracy: %.2f%%' % (scores[1] * 100))

    结果

    Model: "sequential_6"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding_3 (Embedding)      (None, 500, 32)           160000    
    _________________________________________________________________
    lstm_6 (LSTM)                (None, 100)               53200     
    _________________________________________________________________
    dense_3 (Dense)              (None, 1)                 101       
    =================================================================
    Total params: 213,301
    Trainable params: 213,301
    Non-trainable params: 0
    _________________________________________________________________
    M:Anaconda3libsite-packages	ensorflow_corepythonframeworkindexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
      "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
    Epoch 1/2
     - 468s - loss: 0.5198 - accuracy: 0.7326
    Epoch 2/2
     - 412s - loss: 0.2807 - accuracy: 0.8871
    Accuracy: 85.58%
  • 相关阅读:
    从docker容器拷贝文件出来
    R csv数据集资源下载
    使用docker镜像搭建Python3 jupyter notebook环境
    用Python合并多个Excel文件
    vscode保存文件时自动删除行尾空格
    js判断数组是否包含某元素
    数据库锁问题
    协程
    进程间的八种通信方式
    进程与线程的区别
  • 原文地址:https://www.cnblogs.com/Howbin/p/12604437.html
Copyright © 2011-2022 走看看