Pytorch学习笔记06---- torch.nn.Embedding 词嵌入层的理解

zoukankan html css js c++ java

Pytorch学习笔记06---- torch.nn.Embedding 词嵌入层的理解
1.word Embedding的概念理解

首先，我们先理解一下什么是Embedding。Word Embedding翻译过来的意思就是词嵌入，通俗来讲就是将文字转换为一串数字。因为数字是计算机更容易识别的一种表达形式。我们词嵌入的过程，就相当于是我们在给计算机制造出一本字典的过程。计算机可以通过这个字典来间接地识别文字。词嵌入向量的意思也可以理解成：词在神经网络中的向量表示。

2.Pytorch中的Embedding

官方文档的定义：
A simple lookup table that stores embeddings of a fixed dictionary and size. This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.
一个简单的存储固定大小的词典的嵌入向量的查找表，意思就是说，给一个编号，嵌入层就能返回这个编号对应的嵌入向量，嵌入向量反映了各个编号代表的符号之间的语义关系。该模块通常用于存储单词嵌入并使用索引检索它们。

模块的输入是索引列表，输出是相应的词嵌入。

官方文档参数说明：
def __init__(self, num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2., scale_grad_by_freq=False, sparse=False, _weight=None)
Args: num_embeddings (int): size of the dictionary of embeddings embedding_dim (int): the size of each embedding vector padding_idx (int, optional): If given, pads the output with the embedding vector at :attr:`padding_idx` (initialized to zeros) whenever it encounters the index. max_norm (float, optional): If given, each embedding vector with norm larger than :attr:`max_norm` is renormalized to have norm :attr:`max_norm`. norm_type (float, optional): The p of the p-norm to compute for the :attr:`max_norm` option. Default ``2``. scale_grad_by_freq (boolean, optional): If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default ``False``. sparse (bool, optional): If ``True``, gradient w.r.t. :attr:`weight` matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.
参数理解说明：
- num_embeddings (python:int) – 词典的大小尺寸，即一个词典里要有多少个词，比如总共出现5000个词，那就输入5000。此时index为（0-4999）
- embedding_dim (python:int) – 嵌入向量的维度，即用多少维来表示一个符号。
- padding_idx (python:int, optional) – 填充id，比如，输入长度为100，但是每次的句子长度并不一样，后面就需要用统一的数字填充，而这里就是指定这个数字，这样，网络在遇到填充id时，就不会计算其与其它符号的相关性。（初始化为0）
- max_norm (python:float, optional) – 最大范数，如果嵌入向量的范数超过了这个界限，就要进行再归一化。
- norm_type (python:float, optional) – 指定利用什么范数计算，并用于对比max_norm，默认为2范数。
- scale_grad_by_freq (boolean, optional) – 根据单词在mini-batch中出现的频率，对梯度进行放缩。默认为False.
- sparse (bool, optional) – 若为True,则与权重矩阵相关的梯度转变为稀疏张量
输入： LongTensor (N, W), N = mini-batch, W = 每个mini-batch中提取的下标数
输出： (N, W, embedding_dim)

这个语句是创建一个词嵌入模型，num_embeddings代表一共有多少个词，embedding_dim代表你想要为每个词创建一个多少维的向量来表示它

案例解释：
import torch from torch import nn embedding = nn.Embedding(5, 4) # 假定字典中只有5个词，词向量维度为4 word = [[1, 2, 3], [2, 3, 4]] # 每个数字代表一个词，例如 {'!':0,'how':1, 'are':2, 'you':3, 'ok':4} #而且这些数字的范围只能在0～4之间，因为上面定义了只有5个词 embed = embedding(torch.LongTensor(word)) print(embed) print(embed.size())
输出：
tensor([[[-0.4093, -1.0110, 0.6731, 0.0790], [-0.6557, -0.9846, -0.1647, 2.2633], [-0.5706, -1.1936, -0.2704, 0.0708]], [[-0.6557, -0.9846, -0.1647, 2.2633], [-0.5706, -1.1936, -0.2704, 0.0708], [ 0.2242, -0.5989, 0.4237, 2.2405]]], grad_fn=<EmbeddingBackward>) torch.Size([2, 3, 4])
embed输出的维度是[2, 3, 4]，这就代表对于输入的[2,3]维的词，每一个词都被映射成了一个4维的向量。
查看全文

相关阅读:
线程池七大参数介绍
 线程池的三个使用方式
 线程池使用及优势
 css selector 定位
 xpath 定位小技巧
 centos7部署web测试环境 jdk，tomcat，mysql
Java 访问修饰符
 webdriver的handle 切换窗口
 P1392 取数
 P3414 SAC#1

原文地址：https://www.cnblogs.com/luckyplj/p/13377672.html