zoukankan      html  css  js  c++  java
  • Sequence Model

    Word Embeddings

    Word Representation

    • 1-hot representation: any product of them is (0)
    • Featurized representation: word embedding

    Visualizing word embeddings

    visualize

    t-SNE algorithm: (300 mathrm D o 2 mathrm D)

    learn the concepts that fell like they should be more related

    Using word embeddings

    Named entity recognition example

    name_entity

    it will be much smaller in training sets and so this allows you to carry out transfer learning

    Transfer learning and word embeddings

    • Learn word embeddings from large text corputs. ((1 - 100mathrm B) words)

      (or download pre-trained embedding online.)

    • Transfer embedding to new task with smaller training set.

      (say, 100k words)

    • Optional: Continue to finetune word embeddings with new data

    Properties of Word Embeddings

    Analogies

    ( ext{Man} o ext{Woman } as ext{ King} o ?)

    (e_{ ext{man}} - e_{ ext{woman}} approx egin{bmatrix} -2 \ 0 \ 0 \ 0 end{bmatrix} approx e_{ ext{king}} - e_{ ext{queen}})

    (e_? approx e_ ext{king} - e_ ext{man} + e_ ext{woman} approx e_{ ext{queen}})

    find a word (w) to satisfiy (argmax_w ext{sim}(e_w, e_ ext{king} - e_ ext{man} + e_ ext{woman}))

    • Cosine similarity

      [ ext{sim}(u, v) = frac{u^{T}v}{||u||_2 ||v||_2} ]

    Embedding Matrix

    embedding_matrix

    Learning Word Embeddings: Word2vec & GloVe

    Learning Word Embeddings

    • Neural language model

      mask a word and build a network to predict the word, and get the parameters

    neural_language_model
    • Other context/target pairs

      Context: Last 4 words / 4 words on left & right / Last 1 word / Neraby 1 word(skig gram)

      ( ext{a glass of orange } underline{?} ext{ to go along with})

    Word2Vec

    Skip-grams

    come up with a few context to target errors to create our supervised learning problem

    • Model

      ( ext{Vocab size} = 10000)

      ( ext{Context } c ext{ "orange"(6527)} = ext{Target } t ext{ "juice"(4834)})

      (O_c o E o e_c( = E imes O_c) o o( ext{softmax}) o hat y)

      [ ext{softmax}: P(t | c) = frac{e^{ heta_t^T e_c}}{sum_{j = 1}^{10000} e^{ heta_j^T e_c}} ]

      (e_t) is a parameter associated with output (t)

      [ ext{Loss}: mathcal L(hat y, y) = - sum_{i = 1}^{10000} y_i log hat y_i ]

    • Problems with softmax classification

      computation cost is too high

    • Solutions with softmax classification

      hierarchical softmax classifier

    hierarchical_softmax

    Negative Sampling

    context word target?
    orange juice 1
    orange king 0
    orange book 0
    orange the 0
    orange of 0

    Defining a new learning problem & Model

    • pick a context word and a target word to get a positive example;

    • pick k random words in dictionary and the target word to get k negative examples.

      [k = egin{cases} 5 sim 20 & ( ext{small dataset})\ 2 sim 5 & ( ext{larget dataset}) end{cases} ]

    • train 10000 binary classification problem ( (k+1) example ) instead of multiple classification(computation cost is much lower)

    Selecting negative examples

    [P(w_i) = frac{f(w_i)^{3/ 4}}{sum_{j = 1}^{10000} f(w_j)^{3/4}} ]

    (f(w_i)) represents the frequency of (w_i) .

    GloVe Word Vectors

    GloVe(global vectors for word representation)

    (X_{ct} = X_{ij} = ext{times } i ext{ appears in context } j)

    (X_{ij} = X_{ji}) represent how (i, j) close to each others

    [min sum_{i = 1}^{n} sum_{j = 1}^n f(X_{ij})( heta_i^T e_j + b_i + b_j' - log X_{ij})^2 ]

    (f(X_{ij})) is a weighting term:

    [f(X_{ij}) = egin{cases} 0 & ext{if } X_{ij} = 0\ ext{high} & ext{(stopwords) this, is, of, a, }cdots\ ext{low} & ext{(rare words) durian, }cdots end{cases} ]

    (regarding (0 log 0 = 0) )

    ( heta_i) and (e_j) are symmetric so you can calculate
    (displaystyle e_w^{ ext{final}} = frac{e_w + heta_w}{2}) .

    Applications Using Word Embeddings

    Sentiment Classification

    Average the word embeddings of the sentence and use a softmax to predict

    sentiment_classification

    But it makes some mistakes, e.x. "Completely lacking in good taste, good service, and good ambience."

    RNN for sentiment classification

    Use the many-to-one RNN (input the word embeddings) can solve this problem.

    Debiasing word embeddings

    Word embeddings can reflect gender, ethnicity, age, sexual, orientation, and other biases of the text used to train the model.

    Addressing bias in word embeddings

    • Indentify bias direction

      average
      ( egin{cases} e_{ ext{he}} - e_{ ext{she}}\ e_{ ext{male}} - e_{ ext{female}}\ dots end{cases} )

      bias direction( (1 ext{ D}) )

      non-bias direction( (n-1 ext{ D}) )

      SVU(singluar vale decomposition, like PCA) can solve it

    • Neutralize: For every word that is not definitional, project to get rid of bias

      (need to figure out which words should be neutralize, use SVM first to classify)

    • Equalize pairs.

      grandmother - grandfater have the same similarity and distance(gender neural)

      you can handpick them(they are not so much)

    Homework - Emojify

    Building the Emojifier-V2

    emojifier-v2
    # UNQ_C5 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
    # GRADED FUNCTION: Emojify_V2
    
    def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
        """
        Function creating the Emojify-v2 model's graph.
        
        Arguments:
        input_shape -- shape of the input, usually (max_len,)
        word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
        word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
    
        Returns:
        model -- a model instance in Keras
        """
        
        ### START CODE HERE ###
        # Define sentence_indices as the input of the graph.
        # It should be of shape input_shape and dtype 'int32' (as it contains indices, which are integers).
        sentence_indices = Input(input_shape, dtype = 'int32')
        
        # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
        embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
        
        # Propagate sentence_indices through your embedding layer
        # (See additional hints in the instructions).
        embeddings = embedding_layer(sentence_indices)   
        
        # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
        # The returned output should be a batch of sequences.
        X = LSTM(128, return_sequences = True)(embeddings)
        # Add dropout with a probability of 0.5
        X = Dropout(0.5)(X)
        # Propagate X trough another LSTM layer with 128-dimensional hidden state
        # The returned output should be a single hidden state, not a batch of sequences.
        X = LSTM(128, return_sequences = False)(X)
        # Add dropout with a probability of 0.5
        X = Dropout(0.5)(X)
        # Propagate X through a Dense layer with 5 units
        X = Dense(5)(X)
        # Add a softmax activation
        X = Activation('softmax')(X)
        
        # Create Model instance which converts sentence_indices into X.
        model = Model(inputs = sentence_indices, outputs = X)
        
        ### END CODE HERE ###
        
        return model
    
    model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
    model.summary()
    
    Model: "functional_3"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    input_2 (InputLayer)         [(None, 10)]              0         
    _________________________________________________________________
    embedding_3 (Embedding)      (None, 10, 50)            20000050  
    _________________________________________________________________
    lstm_2 (LSTM)                (None, 10, 128)           91648     
    _________________________________________________________________
    dropout_2 (Dropout)          (None, 10, 128)           0         
    _________________________________________________________________
    lstm_3 (LSTM)                (None, 128)               131584    
    _________________________________________________________________
    dropout_3 (Dropout)          (None, 128)               0         
    _________________________________________________________________
    dense_1 (Dense)              (None, 5)                 645       
    _________________________________________________________________
    activation_1 (Activation)    (None, 5)                 0         
    =================================================================
    Total params: 20,223,927
    Trainable params: 223,877
    Non-trainable params: 20,000,050
    _________________________________________________________________
    

    Compile it

    model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
    

    Train it

    X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
    Y_train_oh = convert_to_one_hot(Y_train, C = 5)
    model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)
    
  • 相关阅读:
    DataTable.Compute功能
    ip的划分,超详细
    静态页 htm传参数
    [你必须知道的.NET] 第四回:后来居上:class和struct
    [你必须知道的.NET] 第八回:品味类型值类型与引用类型(上)-内存有理
    [你必须知道的.NET] 第五回:深入浅出关键字把new说透
    作废
    XML Schema <第三篇>
    XML基础<第一篇>
    NHibernate之配置文件属性说明
  • 原文地址:https://www.cnblogs.com/zjp-shadow/p/15142398.html
Copyright © 2011-2022 走看看