zoukankan      html  css  js  c++  java
  • Pytorch-LSTM+Attention文本分类

    摘抄笔记

    语料链接:https://pan.baidu.com/s/1aDIp3Hxw-Xuxcx-lQ_0w9A
    提取码:hpg7

    train.txt  pos/neg各500条,一共1000条(用于训练模型)
    dev.txt    pos/neg各100条,一共200条(用于调参数)
    test.txt    pos/neg各150条,一共300条(用于测试)
    
    例如:下面是一个正面样本的例子。
    <Polarity>1</Polarity>
    <text>sit back in one of those comfortable chairs.</text>
    

    1. 数据预处理

    加载数据、创建vocabulary、创建iterator

    import numpy as np
    import torch
    import torch.nn.functional as F 
    from torchtext import data
    import math
    import time
    
    SEED = 123
    BATCH_SIZE = 128
    LEARNING_RATE = 1e-3      # 学习率
    EMBEDDING_DIM = 100       # 词向量维度
    
    # 设置device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    # 为CPU设置随机种子
    torch.manual_seed(123)
    
    # 两个Field对象定义字段的处理方法(文本字段、标签字段)
    TEXT = data.Field(tokenize=lambda x: x.split(), lower=True)
    LABEL = data.LabelField(dtype=torch.float)
    
    # get_dataset: 返回Dataset所需的 text 和 label
    def get_dataset(corpus_path, text_field, label_field):
        fields = [('text', text_field), ('label', label_field)]   # torchtext文本配对关系
        examples = []
        
        with open(corpus_path) as f:
            li = []
            while True:
                content = f.readline().replace('
    ', '')
                if not content:     # 为空行,表示取完一次数据(一次的数据保存在li中)
                    if not li:
                        break
                    label = li[0][10]
                    text = li[1][6:-7]
                    examples.append(data.Example.fromlist([text, label], fields=fields))
                    li = []
                else:
                    li.append(content)
        return examples, fields
    
    # 得到构建Dataset所需的examples 和 fields
    train_examples, train_fileds = get_dataset('./corpus/trains.txt', TEXT, LABEL)
    dev_examples, dev_fields = get_dataset('./corpus/dev.txt', TEXT, LABEL)
    test_examples, test_fields = get_dataset('./corpus/tests.txt', TEXT, LABEL)
    
    # 构建Dataset数据集
    train_data = data.Dataset(train_examples, train_fileds)
    dev_data = data.Dataset(dev_examples, dev_fields)
    test_data = data.Dataset(test_examples, test_fields)
    # for t in test_data:
    #     print(t.text, t.label)
    
    print('len of train data:', len(train_data))  # 1000
    print('len of dev data:', len(dev_data))      # 200
    print('len of test data:', len(test_data))    # 300
    
    # 创建vocabulary
    TEXT.build_vocab(train_data, max_size=5000, vectors='glove.6B.100d')
    LABEL.build_vocab(train_data)
    print(len(TEXT.vocab))         # 3287
    print(TEXT.vocab.itos[:12])    # ['<unk>', '<pad>', 'the', 'and', 'a', 'to', 'is', 'was', 'i', 'of', 'for', 'in']
    print(TEXT.vocab.stoi['love']) # 129
    # print(TEXT.vocab.stoi)         # defaultdict {'<unk>': 0, '<pad>': 1, ....}
    
    # 创建iterators, 每个iteration都会返回一个batch的example
    train_iterator, dev_iterator, test_iterator = data.BucketIterator.splits(
                                                (train_data, dev_data, test_data), 
                                                batch_size=BATCH_SIZE,
                                                device=device,
                                                sort = False)
    

    2. 定义模型

    2.1 形式一:根据注意力机制的定义求解

    class BiLSTM_Attention(nn.Module):
        
        def __init__(self, vocab_size, embedding_dim, hidden_dim, n_layers):
            
            super(BiLSTM_Attention, self).__init__()
            
            self.hidden_dim = hidden_dim
            self.n_layers = n_layers
            self.embedding = nn.Embedding(vocab_size, embedding_dim)
            self.rnn = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=n_layers,
                               bidirectional=True, dropout=0.5)
           
            self.fc = nn.Linear(hidden_dim * 2, 1)
            self.dropout = nn.Dropout(0.5)  
            
        # x: [batch, seq_len, hidden_dim*2]
        # query : [batch, seq_len, hidden_dim * 2]
        # 软注意力机制 (key=value=x)
        def attention_net(self, x, query, mask=None): 
            
            d_k = query.size(-1)     # d_k为query的维度
           
            # query:[batch, seq_len, hidden_dim*2], x.t:[batch, hidden_dim*2, seq_len]
    #         print("query: ", query.shape, x.transpose(1, 2).shape)  # torch.Size([128, 38, 128]) torch.Size([128, 128, 38])
            # 打分机制 scores: [batch, seq_len, seq_len]
            scores = torch.matmul(query, x.transpose(1, 2)) / math.sqrt(d_k)  
    #         print("score: ", scores.shape)  # torch.Size([128, 38, 38])
            
            # 对最后一个维度 归一化得分
            alpha_n = F.softmax(scores, dim=-1) 
    #         print("alpha_n: ", alpha_n.shape)    # torch.Size([128, 38, 38])
            # 对权重化的x求和
            # [batch, seq_len, seq_len]·[batch,seq_len, hidden_dim*2] = [batch,seq_len,hidden_dim*2] -> [batch, hidden_dim*2]
            context = torch.matmul(alpha_n, x).sum(1)
            
            return context, alpha_n
        
        def forward(self, x):
            # [seq_len, batch, embedding_dim]
            embedding = self.dropout(self.embedding(x)) 
            
            # output:[seq_len, batch, hidden_dim*2]
            # hidden/cell:[n_layers*2, batch, hidden_dim]
            output, (final_hidden_state, final_cell_state) = self.rnn(embedding)
            output = output.permute(1, 0, 2)  # [batch, seq_len, hidden_dim*2]
            
            query = self.dropout(output)
            # 加入attention机制
            attn_output, alpha_n = self.attention_net(output, query)
            
            logit = self.fc(attn_output)
            
            return logit
    

    2.2 形式二

    参考:https://blog.csdn.net/qsmx666/article/details/107118550

    Attention公式:

    图中的 (W_w)(u_w) 对应了下面代码中的w_omega和u_omega,随机初始化而来,hit对应x。

    class BiLSTM_Attention(nn.Module):
    
        def __init__(self, vocab_size, embedding_dim, hidden_dim, n_layers):
    
            super(BiLSTM_Attention, self).__init__()
    
            self.hidden_dim = hidden_dim
            self.n_layers = n_layers
            self.embedding = nn.Embedding(vocab_size, embedding_dim)        #单词数,嵌入向量维度
            self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers=n_layers, bidirectional=True, dropout=0.5)
            self.fc = nn.Linear(hidden_dim * 2, 1)
            self.dropout = nn.Dropout(0.5)
    
            # 初始时间步和最终时间步的隐藏状态作为全连接层输入
            self.w_omega = nn.Parameter(torch.Tensor(hidden_dim * 2, hidden_dim * 2))
            self.u_omega = nn.Parameter(torch.Tensor(hidden_dim * 2, 1))
    
            nn.init.uniform_(self.w_omega, -0.1, 0.1)
            nn.init.uniform_(self.u_omega, -0.1, 0.1)
    
    
        def attention_net(self, x):       #x:[batch, seq_len, hidden_dim*2]
    
            u = torch.tanh(torch.matmul(x, self.w_omega))         #[batch, seq_len, hidden_dim*2]
            att = torch.matmul(u, self.u_omega)                   #[batch, seq_len, 1]
            att_score = F.softmax(att, dim=1)
    
            scored_x = x * att_score                              #[batch, seq_len, hidden_dim*2]
    
            context = torch.sum(scored_x, dim=1)                  #[batch, hidden_dim*2]
            return context
    
    
        def forward(self, x):
            embedding = self.dropout(self.embedding(x))       #[seq_len, batch, embedding_dim]
    
            # output: [seq_len, batch, hidden_dim*2]     hidden/cell: [n_layers*2, batch, hidden_dim]
            output, (final_hidden_state, final_cell_state) = self.rnn(embedding)
            output = output.permute(1, 0, 2)                  #[batch, seq_len, hidden_dim*2]
    
            attn_output = self.attention_net(output)
            logit = self.fc(attn_output)
            return logit
    

    3. 训练、评估模型

    常规套路:

    • 计算准确率

    • 训练函数

    • 评估函数

    • 打印模型表现

    • 用保存的模型参数预测测试数据。

    # 计算准确率
    def binary_acc(preds, y):
        preds = torch.round(torch.sigmoid(preds))
        correct = torch.eq(preds, y).float()
        acc = correct.sum() / len(correct)
        return acc
    
    # 训练函数
    def train(rnn, iterator, optimizer, criteon):
        avg_loss = []
        avg_acc = []
        rnn.train()    # 表示进入训练模式
        
        for i, batch in enumerate(iterator):
            pred = rnn(batch.text).squeeze()   # [batch, 1] -> [batch]
            
            loss = criteon(pred, batch.label)
            acc = binary_acc(pred, batch.label).item()  # 计算每个batch的准确率
            
            avg_loss.append(loss.item())
            avg_acc.append(acc)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
        avg_acc = np.array(avg_acc).mean()
        avg_loss = np.array(avg_loss).mean()
        return avg_loss, avg_acc
        
    # 评估函数
    def evaluate(rnn, iterator, criteon):
        avg_loss = []
        avg_acc = []
        rnn.eval()    # 进入测试模式
        
        with torch.no_grad():
            for batch in iterator:
                pred = rnn(batch.text).squeeze()  # [batch, 1] -> [batch]
                
                loss = criteon(pred, batch.label)
                acc = binary_acc(pred, batch.label).item()
                
                avg_loss.append(loss.item())
                avg_acc.append(acc)
        
        avg_loss = np.array(avg_loss).mean()
        avg_acc = np.array(avg_acc).mean()
        return avg_loss, avg_acc 
    
    # 训练模型,并打印模型的表现
    best_valid_acc = float('-inf')
    
    for epoch in range(30):
        start_time = time.time()
        
        train_loss, train_acc = train(rnn, train_iterator, optimizer, criteon)
        dev_loss, dev_acc = evaluate(rnn, dev_iterator, criteon)
        
        end_time = time.time()
        
        epoch_mins, epoch_secs = divmod(end_time - start_time, 60)
        
        # 只要模型效果变好,就保存
        if dev_acc > best_valid_acc:
            best_valid_accs = dev_acc
            torch.save(rnn.state_dict(), 'wordavg-model.pt')
            
        print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs:.2f}s')
        print(f'	Train Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
        print(f'	 Val. Loss: {dev_loss:.3f} |  Val. Acc: {dev_acc*100:.2f}%')
        
    

    Epoch: 01 | Epoch Time: 0.0m 5.06s
    Train Loss: 0.675 | Train Acc: 57.73%
    Val. Loss: 0.651 | Val. Acc: 60.33%
    Epoch: 02 | Epoch Time: 0.0m 4.55s
    Train Loss: 0.653 | Train Acc: 61.34%
    Val. Loss: 0.649 | Val. Acc: 64.71%
    Epoch: 03 | Epoch Time: 0.0m 4.46s
    Train Loss: 0.633 | Train Acc: 63.21%
    Val. Loss: 0.630 | Val. Acc: 65.76%
    Epoch: 04 | Epoch Time: 0.0m 4.29s
    Train Loss: 0.613 | Train Acc: 67.01%
    Val. Loss: 0.618 | Val. Acc: 66.06%
    Epoch: 05 | Epoch Time: 0.0m 4.32s
    Train Loss: 0.578 | Train Acc: 68.74%
    Val. Loss: 0.606 | Val. Acc: 69.44%
    Epoch: 06 | Epoch Time: 0.0m 4.93s
    Train Loss: 0.542 | Train Acc: 71.87%
    Val. Loss: 0.620 | Val. Acc: 71.22%
    Epoch: 07 | Epoch Time: 0.0m 5.63s
    Train Loss: 0.510 | Train Acc: 73.76%
    Val. Loss: 0.615 | Val. Acc: 71.22%
    Epoch: 08 | Epoch Time: 0.0m 6.13s
    Train Loss: 0.482 | Train Acc: 77.25%
    Val. Loss: 0.629 | Val. Acc: 72.01%
    Epoch: 09 | Epoch Time: 0.0m 6.86s
    Train Loss: 0.451 | Train Acc: 79.02%
    Val. Loss: 0.706 | Val. Acc: 69.62%
    Epoch: 10 | Epoch Time: 0.0m 5.31s
    Train Loss: 0.418 | Train Acc: 80.55%
    Val. Loss: 0.631 | Val. Acc: 70.83%
    Epoch: 11 | Epoch Time: 0.0m 7.41s
    Train Loss: 0.378 | Train Acc: 84.55%
    Val. Loss: 0.672 | Val. Acc: 70.14%
    Epoch: 12 | Epoch Time: 0.0m 7.10s
    Train Loss: 0.333 | Train Acc: 85.48%
    Val. Loss: 0.697 | Val. Acc: 73.09%
    Epoch: 13 | Epoch Time: 0.0m 5.84s
    Train Loss: 0.304 | Train Acc: 87.85%
    Val. Loss: 0.693 | Val. Acc: 70.01%
    Epoch: 14 | Epoch Time: 0.0m 6.51s
    Train Loss: 0.290 | Train Acc: 88.98%
    Val. Loss: 0.715 | Val. Acc: 70.40%
    Epoch: 15 | Epoch Time: 0.0m 5.75s
    Train Loss: 0.256 | Train Acc: 90.07%
    Val. Loss: 0.745 | Val. Acc: 70.79%
    Epoch: 16 | Epoch Time: 0.0m 7.20s
    Train Loss: 0.235 | Train Acc: 90.46%
    Val. Loss: 0.665 | Val. Acc: 75.61%
    Epoch: 17 | Epoch Time: 0.0m 7.08s
    Train Loss: 0.226 | Train Acc: 91.11%
    Val. Loss: 0.609 | Val. Acc: 77.21%
    Epoch: 18 | Epoch Time: 0.0m 7.33s
    Train Loss: 0.187 | Train Acc: 93.13%
    Val. Loss: 0.698 | Val. Acc: 75.43%
    Epoch: 19 | Epoch Time: 0.0m 7.19s
    Train Loss: 0.166 | Train Acc: 93.18%
    Val. Loss: 0.697 | Val. Acc: 75.82%
    Epoch: 20 | Epoch Time: 0.0m 6.93s
    Train Loss: 0.167 | Train Acc: 94.07%
    Val. Loss: 0.708 | Val. Acc: 76.43%
    Epoch: 21 | Epoch Time: 0.0m 6.91s
    Train Loss: 0.130 | Train Acc: 94.22%
    Val. Loss: 0.847 | Val. Acc: 71.88%
    Epoch: 22 | Epoch Time: 0.0m 7.47s
    Train Loss: 0.098 | Train Acc: 96.51%
    Val. Loss: 0.765 | Val. Acc: 75.04%
    Epoch: 23 | Epoch Time: 0.0m 7.33s
    Train Loss: 0.098 | Train Acc: 96.00%
    Val. Loss: 0.829 | Val. Acc: 77.99%
    Epoch: 24 | Epoch Time: 0.0m 6.24s
    Train Loss: 0.093 | Train Acc: 96.91%
    Val. Loss: 0.831 | Val. Acc: 76.69%
    Epoch: 25 | Epoch Time: 0.0m 6.83s
    Train Loss: 0.058 | Train Acc: 97.51%
    Val. Loss: 0.832 | Val. Acc: 75.22%
    Epoch: 26 | Epoch Time: 0.0m 6.56s
    Train Loss: 0.062 | Train Acc: 97.83%
    Val. Loss: 0.976 | Val. Acc: 76.00%
    Epoch: 27 | Epoch Time: 0.0m 6.71s
    Train Loss: 0.076 | Train Acc: 98.12%
    Val. Loss: 0.977 | Val. Acc: 74.05%
    Epoch: 28 | Epoch Time: 0.0m 6.86s
    Train Loss: 0.049 | Train Acc: 98.20%
    Val. Loss: 0.981 | Val. Acc: 74.44%
    Epoch: 29 | Epoch Time: 0.0m 7.45s
    Train Loss: 0.070 | Train Acc: 97.57%
    Val. Loss: 1.081 | Val. Acc: 72.87%
    Epoch: 30 | Epoch Time: 0.0m 6.60s
    Train Loss: 0.059 | Train Acc: 98.15%
    Val. Loss: 0.776 | Val. Acc: 75.22%

    # 用保存的模型参数预测数据
    rnn.load_state_dict(torch.load('wordavg-model.pt'))
    
    test_loss, test_acc = evaluate(rnn, test_iterator, criteon)
    print(f'Test. Loss: {test_loss:.3f} |  Test. Acc: {test_acc*100:.2f}%')
    

    hidden_dim=64, n_layers=2的条件下:

    当定义的模型部分只有LSTM时,准确率:81.16%

    当使用2.1的Attention公式,准确率:82.46%

    当使用2.2的Attention公式,准确率:81.49%

    加入Attention机制,性能略有提升。

  • 相关阅读:
    python3中类(class)的一些概念
    python 第三方库paramiko
    阿里云盘PC版开放了
    解决c#,wpf程序带环境安装包体积太大问题
    【利用静态网站传输数据】
    【.net】创建属于自己的log组件——改进版
    ThingsBoard 3.2.2 发布
    mac 安装pip2
    cocos creator2.4.3 内存优化总结
    cocos creator2.4.3 组件 节点 预制体
  • 原文地址:https://www.cnblogs.com/douzujun/p/13511237.html
Copyright © 2011-2022 走看看