zoukankan      html  css  js  c++  java
  • 函数式模型示例

     
    Google inception

    GoogLeNet的inception是一个相对比较复杂的拼接网络,我们用函数式模型来完成一下

    from keras.layers import Conv2D, MaxPooling2D, Input

    input_img = Input(shape=(3, 256, 256))

    tower_1 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)

    tower_1 = Conv2D(64, (3, 3), padding='same', activation='relu')(tower_1)

    tower_2 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)

    tower_2 = Conv2D(64, (5, 5), padding='same', activation='relu')(tower_2)

    tower_3 = MaxPooling2D((3, 3), strides=(1, 1), padding='same')(input_img)

    tower_3 = Conv2D(64, (1, 1), padding='same', activation='relu')(tower_3)

    output = keras.layers.concatenate([tower_1, tower_2, tower_3], axis=1)

    ResNet

    残差网络(Residual Network)是一个带高速通道的网络,可以考虑下面的函数式模型实现

    from keras.layers import Conv2D, Input

    # input tensor for a 3-channel 256x256 image

    x = Input(shape=(3, 256, 256))

    # 3x3 conv with 3 output channels (same as input channels)

    y = Conv2D(3, (3, 3), padding='same')(x)

    # this returns x + y.

    z = keras.layers.add([x, y])

    共享视觉模型

    该模型在两个输入上重用了图像处理的模型,用来判别两个MNIST数字是否是相同的数字

    from keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten

    from keras.models import Model

    # First, define the vision modules

    digit_input = Input(shape=(1, 27, 27))

    x = Conv2D(64, (3, 3))(digit_input)

    x = Conv2D(64, (3, 3))(x)

    x = MaxPooling2D((2, 2))(x)

    out = Flatten()(x)

    vision_model = Model(digit_input, out)

    # Then define the tell-digits-apart model

    digit_a = Input(shape=(1, 27, 27))

    digit_b = Input(shape=(1, 27, 27))

    # The vision model will be shared, weights and all

    out_a = vision_model(digit_a)

    out_b = vision_model(digit_b)

    concatenated = keras.layers.concatenate([out_a, out_b])

    out = Dense(1, activation='sigmoid')(concatenated)

    classification_model = Model([digit_a, digit_b], out)

    视觉问答模型

    在针对一幅图片使用自然语言进行提问时,该模型能够提供关于该图片的一个单词的答案

    这个模型将自然语言的问题和图片分别映射为特征向量,将二者合并后训练一个logistic回归层,从一系列可能的回答中挑选一个。

    from keras.layers import Conv2D, MaxPooling2D, Flatten

    from keras.layers import Input, LSTM, Embedding, Dense

    from keras.models import Model, Sequential

    # First, let's define a vision model using a Sequential model.

    # This model will encode an image into a vector.

    vision_model = Sequential()

    vision_model.add(Conv2D(64, (3, 3) activation='relu', padding='same', input_shape=(3, 224, 224)))

    vision_model.add(Conv2D(64, (3, 3), activation='relu'))

    vision_model.add(MaxPooling2D((2, 2)))

    vision_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))

    vision_model.add(Conv2D(128, (3, 3), activation='relu'))

    vision_model.add(MaxPooling2D((2, 2)))

    vision_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))

    vision_model.add(Conv2D(256, (3, 3), activation='relu'))

    vision_model.add(Conv2D(256, (3, 3), activation='relu'))

    vision_model.add(MaxPooling2D((2, 2)))

    vision_model.add(Flatten())

    # Now let's get a tensor with the output of our vision model:

    image_input = Input(shape=(3, 224, 224))

    encoded_image = vision_model(image_input)

    # Next, let's define a language model to encode the question into a vector.

    # Each question will be at most 100 word long,

    # and we will index words as integers from 1 to 9999.

    question_input = Input(shape=(100,), dtype='int32')

    embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)(question_input)

    encoded_question = LSTM(256)(embedded_question)

    # Let's concatenate the question vector and the image vector:

    merged = keras.layers.concatenate([encoded_question, encoded_image])

    # And let's train a logistic regression over 1000 words on top:

    output = Dense(1000, activation='softmax')(merged)

    # This is our final model:

    vqa_model = Model(inputs=[image_input, question_input], outputs=output)

    视频问答模型

    在做完图片问答模型后,我们可以快速将其转为视频问答的模型。在适当的训练下,你可以为模型提供一个短视频(如100帧)然后向模型提问一个关于该视频的问题,如“what sport is the boy playing?”->“football”

    from keras.layers import TimeDistributed

    video_input = Input(shape=(100, 3, 224, 224))

    # This is our video encoded via the previously trained vision_model (weights are reused)

    encoded_frame_sequence = TimeDistributed(vision_model)(video_input)  # the output will be a sequence of vectors

    encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector

    # This is a model-level representation of the question encoder, reusing the same weights as before:

    question_encoder = Model(inputs=question_input, outputs=encoded_question)

    # Let's use it to encode the question:

    video_question_input = Input(shape=(100,), dtype='int32')

    encoded_video_question = question_encoder(video_question_input)

    # And this is our video question answering model:

    merged = keras.layers.concatenate([encoded_video, encoded_video_question])

    output = Dense(1000, activation='softmax')(merged)

    video_qa_model = Model(inputs=[video_input, video_question_input], outputs=output)

  • 相关阅读:
    leetcode 190 Reverse Bits
    vs2010 单文档MFC 通过加载位图文件作为客户区背景
    leetcode 198 House Robber
    记忆化搜索(DP+DFS) URAL 1183 Brackets Sequence
    逆序数2 HDOJ 1394 Minimum Inversion Number
    矩阵连乘积 ZOJ 1276 Optimal Array Multiplication Sequence
    递推DP URAL 1586 Threeprime Numbers
    递推DP URAL 1167 Bicolored Horses
    递推DP URAL 1017 Staircases
    01背包 URAL 1073 Square Country
  • 原文地址:https://www.cnblogs.com/yongfuxue/p/10095905.html
Copyright © 2011-2022 走看看