而RNN本质上就是linear layers。
即使RNN的输入数据是batch first,内部也会转为seq_len first。
def forward(self, input, hx=None): batch_sizes = None # is not packed, batch_sizes = None max_batch_size = input.size(0) if self.batch_first else input.size(1) # batch_size if hx is None: # 使用者可以不传输hidden, 自动创建全0的hidden num_directions = 2 if self.bidirectional else 1 hx = torch.autograd.Variable(input.data.new(self.num_layers * num_directions, max_batch_size, self.hidden_size).zero_()) if self.mode == 'LSTM': # h_0, c_0 hx = (hx, hx) flat_weight = None # if cpu func = self._backend.RNN( # self._backend = thnn_backend # backend = THNNFunctionBackend(), FunctionBackend self.mode, self.input_size, self.hidden_size, num_layers=self.num_layers, batch_first=self.batch_first, dropout=self.dropout, train=self.training, bidirectional=self.bidirectional, batch_sizes=batch_sizes, dropout_state=self.dropout_state, flat_weight=flat_weight ) output, hidden = func(input, self.all_weights, hx) return output, hidden
可以看到,在训练RNN时,可以不传入 ,此时PyTorch会自动创建全0的 。
也可以对RNN的output添加一层全连接层实现与hidden的不同