Keras同时有多个输出时损失函数计算方法和反向传播过程

zoukankan html css js c++ java

Keras同时有多个输出时损失函数计算方法和反向传播过程
来源：https://stackoverflow.com/questions/57149476/how-is-a-multiple-outputs-deep-learning-model-trained

Keras calculations are graph based and use only one optimizer.

The optimizer is also a part of the graph, and in its calculations it gets the gradients of the whole group of weights. (Not two groups of gradients, one for each output, but one group of gradients for the entire model).

Mathematically, it's not really complicated, you have a final loss function made of:

loss = (main_weight * main_loss) + (aux_weight * aux_loss) #you choose the weights in model.compile

All defined by you. Plus a series of other possible weights (sample weights, class weights, regularizer terms, etc.)

Where:

main_loss is a function_of(main_true_output_data, main_model_output)
aux_loss is a function_of(aux_true_output_data, aux_model_output)

And the gradients are just ∂(loss)/∂(weight_i) for all weights.

Once the optimizer has the gradients, it performs its optimization step once.

Questions:

how are the auxiliary branch weights updated as it is not connected directly to the main output?

You have two output datasets. One dataset for main_output and another dataset for aux_output. You must pass them to fit in model.fit(inputs, [main_y, aux_y], ...)
You also have two loss functions, one for each, where main_loss takes main_y and main_out; and aux_loss takex aux_y and aux_out.
The two losses are summed: loss = (main_weight * main_loss) + (aux_weight * aux_loss)
The gradients are calculated for the function loss once, and this function connects to the entire model.
The aux term will affect lstm_1 and embedding_1 in backpropagation.
Consequently, in the next forward pass (after weights are updated) it will end up influencing the main branch. (If it will be better or worse only depends on whether the aux output is useful or not)

Is the part of the network which is between the root of the auxiliary branch and the main output concerned by the the weighting of the loss? Or the weighting influences only the part of the network that is connected to the auxiliary output?

The weights are plain mathematics. You will define them in compile:

model.compile(optimizer=one_optimizer, #you choose each loss loss={'main_output':main_loss, 'aux_output':aux_loss}, #you choose each weight loss_weights={'main_output': main_weight, 'aux_output': aux_weight}, metrics = ...)

And the loss function will use them in loss = (weight1 * loss1) + (weight2 * loss2).
The rest is the mathematical calculation of ∂(loss)/∂(weight_i) for each weight.
查看全文

相关阅读:
使用NBU进行oracle异机恢复
 mycat偶尔会出现JVM报错double free or corruption并崩溃退出
 exp导出数据时丢表
 service_names配置不正确，导致dg创建失败
 XML概念定义以及如何定义xml文件编写约束条件java解析xml DTD XML Schema JAXP java xml解析 dom4j 解析 xpath dom sax
HTTP协议简介详解 HTTP协议发展原理请求方法响应状态码请求头请求首部 java模拟浏览器客户端服务端
 java集合框架容器 java框架层级继承图结构集合框架的抽象类集合框架主要实现类
 【JAVA集合框架一】java集合框架官方介绍 Collections Framework Overview 集合框架总览翻译 javase8 集合官方文档中文版
 java内部类深入详解内部类的分类特点定义方式使用
 再谈包访问权限子类为何不能使用父类protected方法

原文地址：https://www.cnblogs.com/yaos/p/14014184.html