zoukankan      html  css  js  c++  java
  • transformer中的 train.py的理解

    1. 定义矩形scheme ret 得到一个bach_sizes数组
    {'min_length': 8, 'window_size': 720,
    'shuffle_queue_size': 270,
    'boundaries': [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 33, 36, 39, 42, 46, 50, 55, 60, 66, 72, 79, 86, 94, 103, 113, 124, 136, 149, 163, 179, 196, 215, 236],
    'max_length': 256,
    'batch_sizes': [240, 180, 180, 180, 144, 144, 144, 120, 120, 120, 90, 90, 90, 90, 80, 72, 72, 60, 60, 48, 48, 48, 40, 40, 36, 30, 30, 24, 24, 20, 20, 18, 18, 16, 15, 12, 12, 10, 10, 9, 8, 8]}
    2.input_pipline 读取文件 10个文件 decode_record
    组合成字典形式的数据集 dataset {"src_id": "target_id":}
    (1)过滤长度:#根据源端和目标端句子长度最大的过滤
    length = _example_length(example)
    return tf.logical_and(length >= min_length, length <= max_length)
    dataset = dataset.filter(functools.partial(example_valid_size, min_length = batching_scheme["min_length"], max_length = batching_scheme["max_length"]))
    filter会作用于每一个dataset
    (2)根据长度选择篮子编号:传入dataset {"src_id": "target_id":} 以及bundaries{} 遍历句子的长度,进行比较
    conditions_c = tf.logical_and(tf.less_equal(buckets_min, seq_length), tf.less(seq_length, buckets_max))
    返回 budaries所在的位置
    根据上次返回的id,找到篮子的位置,并找到窗口的大小。其窗口的定义,用英文的解释比较好理解:我所理解的就是,比如一个能放mg的篮子
    window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.
    tf.contrib.data.group_by_window(

    key_func,
    reduce_func,
    window_size=None,
    window_size_func=None
    )
    Defined in tensorflow/contrib/data/python/ops/grouping.py.

    A transformation that groups windows of elements by key and reduces them.

    This transformation maps each consecutive element in a dataset to a key using key_func and groups the elements by key. It then applies reduce_func to at most window_size_func(key) elements matching the same key. All except the final window for each key will contain window_size_func(key) elements; the final window may be smaller.

    You may provide either a constant window_size or a window size determined by the key through window_size_func.

    Args:
    key_func: A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to a scalar tf.int64 tensor.
    reduce_func: A function mapping a key and a dataset of up to window_size consecutive elements matching that key to another dataset.
    window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size_func.
    window_size_func: A function mapping a key to a tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func. Mutually exclusive with window_size.
    Returns:
    A Dataset transformation function, which can be passed to tf.data.Dataset.apply.

    Raises:
    ValueError: if neither or both of {window_size, window_size_func} are passed.
    (3)进行 pad grouped_dataset.padded_batch(batch_size, padded_shapes) ----group_dataset是什么 batch_size 为句子的个数 padded_shapes 要pad的维度
    整合 ,将id序列编程矩阵 dataset.apply(tf.contrib.data.group_by_window(example_to_bucket_id, batching_fn, None, )
    二:

    一维卷积:https://blog.csdn.net/appleyuchi/article/details/78597054
    tf.reshape:https://blog.csdn.net/lxg0807/article/details/53021859
    list和tublehttps://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014316724772904521142196b74a3f8abf93d8e97c6ee6000
    expend_dims :https://blog.csdn.net/qq_31780525/article/details/72280284
    tf.concat 以及tf.split: https://blog.csdn.net/momaojia/article/details/77603322 https://blog.csdn.net/UESTC_C2_403/article/details/73350457
    feedforward:一维卷积网络设计,然后两层卷积之间加了relu非线性操作。之后是residual操作加上inputs残差,然后是normalize--->不直接用layers.dense直接进行全连接
    label_smothing:
    (1)normalization: normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )
    outputs = gamma * normalized + beta 获取均值和方差:
    '''Applies layer normalization.

    Args:
    inputs: A tensor with 2 or more dimensions, where the first dimension has
    `batch_size`.
    epsilon: A floating number. A very small number for preventing ZeroDivision Error.
    scope: Optional scope for `variable_scope`.
    reuse: Boolean, whether to reuse the weights of a previous layer
    by the same name.

    Returns:
    A tensor with the same shape and data dtype as `inputs`.

    '''
    beta,和gamma没有做什么?
    (2)embedding: 其用到了一个tensorflow中一个embedding 方法使输入的张量分布的更均匀,词与词之间存在着某种关系
    并且比输入的多一个维度,最后一维为神经元的个数
    scale参数对outputs根据num_units的大小进行了scale,当scale为True时执行scale,默认为True???????
    '''Embeds a given tensor.
    Args:
    inputs: A `Tensor` with type `int32` or `int64` containing the ids
    to be looked up in `lookup table`.
    vocab_size: An int. Vocabulary size.
    num_units: An int. Number of embedding hidden units.
    zero_pad: A boolean. If True, all the values of the fist row (id 0)
    should be constant zeros.
    scale: A boolean. If True. the outputs is multiplied by sqrt num_units.
    scope: Optional scope for `variable_scope`.
    reuse: Boolean, whether to reuse the weights of a previous layer
    by the same name.
    Returns:
    A `Tensor` with one more rank than inputs's. The last dimensionality
    should be `num_units`.
    其中有用到一个函数: 其作用相当于,中文---英文 之间的对应 一个博客里讲的很靠谱吧,就是输入一个inputs_tensor 当作字典,
    然后给出要表示的ids,最后给出tensor
    其链接:https://www.jianshu.com/p/677e71364c8e 其用到one-hot编码https://blog.csdn.net/pipisorry/article/details/61193868
    (3)multi-head attention;
    a. QKV的全连接 dense:全连接层,其最后一维变为num_units,
    且 outputs = activation(inputs * kernel + bias)
    b.mask 的操作,利用reduce_sum找出为0 的,进行mask,通过将attention_score设置为最小值,标记其位置

    (4)dropout:
    (5)label_smothing:做平滑操作
    (6)位置编码: 有点问题
    最近一直在看这个,但是还是有很多的问题。。























  • 相关阅读:
    ActiveMQ的spring配置文件
    ActiveMQ consumer按顺序处理消息
    ActiveMQ异步分发消息
    2个线程顺序工作
    hadoop更换硬盘
    linux内存条排查
    gitlab迁移升级
    linux 监控网卡实时流量iftop
    gitlab7.2安装
    为首次部署MongoDB做好准备:容量计划和监控
  • 原文地址:https://www.cnblogs.com/Shaylin/p/9918178.html
Copyright © 2011-2022 走看看