Pytorch的Embedding解读

zoukankan html css js c++ java

Pytorch的Embedding解读
Suppose you are working with images. An image is represented as a matrix of RGB values. Each RGB value is a feature that is numerical, that is, values 5 and 10 are closer than values 5 and 100. This information is implicitly used by the network to identify which images are close to each other, by comparing their individual pixel values.

Now, let’s say you are working with text, in particular, sentences. Each sentence is composed of words, which are categorical variables, not numerical. How would you feed a word to a NN? One way to do this is to use one-hot vectors, wherein, you decide on the set of all words you will use the vocabulary. Let’s say your vocabulary has 10000 words, and you have defined an ordering over these words — a, the, they, are, have, etc. Now, you can represent the first word in the ordering a as [1, 0, 0, 0, ….], which is a vector of size 10000 with all zeros except a 1 at position 1. Similarly, the second, third, …, words can be defined as [0, 1, 0, 0, ….], [0, 0, 1, 0, ….], … So, the (i_{th}) word will be a vector of size 10,000 with all zeros, except a 1 at the (i_{th}) position. Now, we have a way to feed the words into the NN. But the notion of distance that we had in case of images is not present.
- All words are equidistant [等距的] from all other words.
- Secondly, the dimension of the input is huge. Your vocabulary size could easily go to 100,000 or more.
Therefore, instead of having a sparse vector for each word, you can have a dense vector for each word, that is, multiple elements of the vector are nonzero and each element of the vector can take continuous values. This immediately reduces the size of the vector. You can have infinite number of unique vectors of size, say 10, where each element can take any arbitrary value as opposed to one-hot vectors where each element could take only values 0 or 1. So, for instance, a could be represented as [0.13, 0.46, 0.85, 0.96, 0.66, 0.12, 0.01, 0.38, 0.76, 0.95], the could be represented as [0.73, 0.45, 0.25, 0.91, 0.06, 0.16, 0.11, 0.36, 0.76, 0.98], and so on. The size of the vectors is a hyperparameter, set using cross-validation. So, how do you feed these dense vector representations of words into the network? The answer is an **embedding layer **— you will have an embedding layer that is essentially a matrix of size 10,000 x 10 [or more generally, vocab_size×dense_vector_size]. For every word, you have an index in the vocabulary, like (a -> 0), (the) -> 1, etc., and you simply **look up **the corresponding row in the embedding matrix to get its 10-dimensional representation as the output.

Now, the embedding layer could be fixed, so that you don’t train it when you train the NN. This could be done, for instance, when you initialize your embedding layer using pretrained word vectors for the words. Alternately, you can initialize the embedding layer randomly, and train it with the other layers. Finally, you could do both — initialize with the word vectors and finetune on the task. In any case, the embeddings of similar words are similar, solving the issue we had with one-hot vectors.
查看全文

相关阅读:
SXOI2016 部分解题报告
 两道FFT的应用题
 [CQOI2012]交换棋子【网络流】【费用流】
JAVA-SDK-Excel4j使用遇见的问题
 解决Zookeeper出现Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain问题
 maven项目打包时jar中不包含依赖
 CentOS_7中的zookeeper安装
 SpringBoot集成Redis出现WRONGTYPE Operation against a key holding the wrong kind of value错误
 主机访问虚拟机中Redis
使用SpringS声明式的开启事务

原文地址：https://www.cnblogs.com/liulunyang/p/14400480.html