zoukankan      html  css  js  c++  java
  • NLP(十九) 双向LSTM情感分类模型

    原文链接:http://www.one2know.cn/nlp19/

    • 使用IMDB情绪数据来比较CNN和RNN两种方法,预处理与上节相同
    from __future__ import print_function
    import numpy as np
    import pandas as pd
    from keras.preprocessing import sequence
    from keras.models import Sequential
    from keras.layers import Dense,Dropout,Embedding,LSTM,Bidirectional
    from keras.datasets import imdb
    from sklearn.metrics import accuracy_score,classification_report
    
    # 限制最大的特征数
    max_features = 15000
    max_len = 300
    batch_size = 64
    
    # 加载数据
    (x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)
    print(len(x_train),'train observations')
    print(len(x_test),'test observations')
    

    输出:

    Using TensorFlow backend.
    25000 train observations
    25000 test observations
    
    • 如何实现
      1.预处理
      2.LSTM模型的构建和验证
      3.模型评估
    • 代码
    from __future__ import print_function
    import numpy as np
    import pandas as pd
    from keras.preprocessing import sequence
    from keras.models import Sequential
    from keras.layers import Dense,Dropout,Embedding,LSTM,Bidirectional
    from keras.datasets import imdb
    from sklearn.metrics import accuracy_score,classification_report
    
    # 限制最大的特征数
    max_features = 15000
    max_len = 300
    batch_size = 64
    
    # 加载数据
    (x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)
    # print(len(x_train),'train observations')
    # print(len(x_test),'test observations')
    
    # 通过序列填充将所有的数据整合为一个固定维度,提高运行效率
    x_train_2 = sequence.pad_sequences(x_train,maxlen=max_len)
    x_test_2 = sequence.pad_sequences(x_test,maxlen=max_len)
    print('x_train_2.shape:',x_train_2.shape)
    print('x_test_2.shape:',x_test_2.shape)
    y_train = np.array(y_train)
    y_test = np.array(y_test)
    
    # keras框架 => 双向LSTM模型
    # 双向LSTM网络有前向和后向连接,使句子中的单词可以同时与左右词汇产生连接
    model = Sequential()
    model.add(Embedding(max_features,128,input_length=max_len)) # 嵌入层将维数降到128
    model.add(Bidirectional(LSTM(64))) # 双向LSTM层
    model.add(Dropout(0.5)) # 随机失活
    model.add(Dense(1,activation='sigmoid')) # 稠密层 将情感分类0或1
    model.compile('adam','binary_crossentropy',metrics=['accuracy']) # 二元交叉熵
    print(model.summary())
    
    model.fit(x_train_2,y_train,batch_size=batch_size,epochs=4,validation_split=0.2)
    
    # 预测及结果
    y_train_predclass = model.predict_classes(x_train_2,batch_size=1000)
    y_test_predclass = model.predict_classes(x_test_2,batch_size=1000)
    y_train_predclass.shape = y_train.shape
    y_test_predclass.shape = y_test.shape
    print('
    
    LSTM Bidirectional Sentiment Classification - Train accuracy:',
          round(accuracy_score(y_train,y_train_predclass),3))
    print('
    LSTM Bidirectional Sentiment Classification of Training data
    ',
          classification_report(y_train,y_train_predclass))
    print('
    LSTM Bidirectional Sentiment Classification - Train Confusion Matrix
    
    ',
          pd.crosstab(y_train,y_train_predclass,rownames=['Actuall'],colnames=['Predicted']))
    print('
    LSTM Bidirectional Sentiment Classification - Test accuracy:',
          round(accuracy_score(y_test,y_test_predclass),3))
    print('
    LSTM Bidirectional Sentiment Classification of Test data
    ',
          classification_report(y_test,y_test_predclass))
    print('
    LSTM Bidirectional Sentiment Classification - Test Confusion Matrix
    
    ',
          pd.crosstab(y_test,y_test_predclass,rownames=['Actuall'],colnames=['Predicted']))
    

    输出:

    Using TensorFlow backend.
    x_train_2.shape: (25000, 300)
    x_test_2.shape: (25000, 300)
    WARNING:tensorflow:From D:Python37Libsite-packages	ensorflowpythonframeworkop_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Colocations handled automatically by placer.
    WARNING:tensorflow:From D:Anaconda3libsite-packageskerasackend	ensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
    Instructions for updating:
    Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding_1 (Embedding)      (None, 300, 128)          1920000   
    _________________________________________________________________
    bidirectional_1 (Bidirection (None, 128)               98816     
    _________________________________________________________________
    dropout_1 (Dropout)          (None, 128)               0         
    _________________________________________________________________
    dense_1 (Dense)              (None, 1)                 129       
    =================================================================
    Total params: 2,018,945
    Trainable params: 2,018,945
    Non-trainable params: 0
    _________________________________________________________________
    None
    WARNING:tensorflow:From D:Python37Libsite-packages	ensorflowpythonopsmath_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.cast instead.
    Train on 20000 samples, validate on 5000 samples
    Epoch 1/4
    2019-07-07 20:03:45.649853: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    
       64/20000 [..............................] - ETA: 18:21 - loss: 0.6915 - acc: 0.5781
      128/20000 [..............................] - ETA: 13:04 - loss: 0.6918 - acc: 0.5938
      192/20000 [..............................] - ETA: 11:14 - loss: 0.6915 - acc: 0.5729
      256/20000 [..............................] - ETA: 10:19 - loss: 0.6917 - acc: 0.5469
      320/20000 [..............................] - ETA: 9:45 - loss: 0.6915 - acc: 0.5469 
      此处省略一堆epoch的一堆操作
    
    LSTM Bidirectional Sentiment Classification - Train accuracy: 0.955
    
    LSTM Bidirectional Sentiment Classification of Training data
                   precision    recall  f1-score   support
    
               0       0.96      0.95      0.95     12500
               1       0.95      0.96      0.95     12500
    
        accuracy                           0.95     25000
       macro avg       0.95      0.95      0.95     25000
    weighted avg       0.95      0.95      0.95     25000
    
    LSTM Bidirectional Sentiment Classification - Train Confusion Matrix
    
     Predicted      0      1
    Actuall                
    0          11928    572
    1            561  11939
    
    LSTM Bidirectional Sentiment Classification - Test accuracy: 0.859
    
    LSTM Bidirectional Sentiment Classification of Test data
                   precision    recall  f1-score   support
    
               0       0.86      0.86      0.86     12500
               1       0.86      0.85      0.86     12500
    
        accuracy                           0.86     25000
       macro avg       0.86      0.86      0.86     25000
    weighted avg       0.86      0.86      0.86     25000
    
    
    LSTM Bidirectional Sentiment Classification - Test Confusion Matrix
    
     Predicted      0      1
    Actuall                
    0          10809   1691
    1           1829  10671
    time============== 2080.618681907654
    
  • 相关阅读:
    flask框架的使用
    git的基本使用
    pycharm连接数据库以及遇到的问题
    Git原理与Git命令大全
    git使用
    Redis 数据库
    ATM项目
    跨域问题及解决方案
    django的信号
    django的缓存机制
  • 原文地址:https://www.cnblogs.com/peng8098/p/nlp_19.html
Copyright © 2011-2022 走看看