zoukankan      html  css  js  c++  java
  • 函数式模型示例

     
    Google inception

    GoogLeNet的inception是一个相对比较复杂的拼接网络,我们用函数式模型来完成一下

    from keras.layers import Conv2D, MaxPooling2D, Input

    input_img = Input(shape=(3, 256, 256))

    tower_1 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)

    tower_1 = Conv2D(64, (3, 3), padding='same', activation='relu')(tower_1)

    tower_2 = Conv2D(64, (1, 1), padding='same', activation='relu')(input_img)

    tower_2 = Conv2D(64, (5, 5), padding='same', activation='relu')(tower_2)

    tower_3 = MaxPooling2D((3, 3), strides=(1, 1), padding='same')(input_img)

    tower_3 = Conv2D(64, (1, 1), padding='same', activation='relu')(tower_3)

    output = keras.layers.concatenate([tower_1, tower_2, tower_3], axis=1)

    ResNet

    残差网络(Residual Network)是一个带高速通道的网络,可以考虑下面的函数式模型实现

    from keras.layers import Conv2D, Input

    # input tensor for a 3-channel 256x256 image

    x = Input(shape=(3, 256, 256))

    # 3x3 conv with 3 output channels (same as input channels)

    y = Conv2D(3, (3, 3), padding='same')(x)

    # this returns x + y.

    z = keras.layers.add([x, y])

    共享视觉模型

    该模型在两个输入上重用了图像处理的模型,用来判别两个MNIST数字是否是相同的数字

    from keras.layers import Conv2D, MaxPooling2D, Input, Dense, Flatten

    from keras.models import Model

    # First, define the vision modules

    digit_input = Input(shape=(1, 27, 27))

    x = Conv2D(64, (3, 3))(digit_input)

    x = Conv2D(64, (3, 3))(x)

    x = MaxPooling2D((2, 2))(x)

    out = Flatten()(x)

    vision_model = Model(digit_input, out)

    # Then define the tell-digits-apart model

    digit_a = Input(shape=(1, 27, 27))

    digit_b = Input(shape=(1, 27, 27))

    # The vision model will be shared, weights and all

    out_a = vision_model(digit_a)

    out_b = vision_model(digit_b)

    concatenated = keras.layers.concatenate([out_a, out_b])

    out = Dense(1, activation='sigmoid')(concatenated)

    classification_model = Model([digit_a, digit_b], out)

    视觉问答模型

    在针对一幅图片使用自然语言进行提问时,该模型能够提供关于该图片的一个单词的答案

    这个模型将自然语言的问题和图片分别映射为特征向量,将二者合并后训练一个logistic回归层,从一系列可能的回答中挑选一个。

    from keras.layers import Conv2D, MaxPooling2D, Flatten

    from keras.layers import Input, LSTM, Embedding, Dense

    from keras.models import Model, Sequential

    # First, let's define a vision model using a Sequential model.

    # This model will encode an image into a vector.

    vision_model = Sequential()

    vision_model.add(Conv2D(64, (3, 3) activation='relu', padding='same', input_shape=(3, 224, 224)))

    vision_model.add(Conv2D(64, (3, 3), activation='relu'))

    vision_model.add(MaxPooling2D((2, 2)))

    vision_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))

    vision_model.add(Conv2D(128, (3, 3), activation='relu'))

    vision_model.add(MaxPooling2D((2, 2)))

    vision_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))

    vision_model.add(Conv2D(256, (3, 3), activation='relu'))

    vision_model.add(Conv2D(256, (3, 3), activation='relu'))

    vision_model.add(MaxPooling2D((2, 2)))

    vision_model.add(Flatten())

    # Now let's get a tensor with the output of our vision model:

    image_input = Input(shape=(3, 224, 224))

    encoded_image = vision_model(image_input)

    # Next, let's define a language model to encode the question into a vector.

    # Each question will be at most 100 word long,

    # and we will index words as integers from 1 to 9999.

    question_input = Input(shape=(100,), dtype='int32')

    embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)(question_input)

    encoded_question = LSTM(256)(embedded_question)

    # Let's concatenate the question vector and the image vector:

    merged = keras.layers.concatenate([encoded_question, encoded_image])

    # And let's train a logistic regression over 1000 words on top:

    output = Dense(1000, activation='softmax')(merged)

    # This is our final model:

    vqa_model = Model(inputs=[image_input, question_input], outputs=output)

    视频问答模型

    在做完图片问答模型后,我们可以快速将其转为视频问答的模型。在适当的训练下,你可以为模型提供一个短视频(如100帧)然后向模型提问一个关于该视频的问题,如“what sport is the boy playing?”->“football”

    from keras.layers import TimeDistributed

    video_input = Input(shape=(100, 3, 224, 224))

    # This is our video encoded via the previously trained vision_model (weights are reused)

    encoded_frame_sequence = TimeDistributed(vision_model)(video_input)  # the output will be a sequence of vectors

    encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector

    # This is a model-level representation of the question encoder, reusing the same weights as before:

    question_encoder = Model(inputs=question_input, outputs=encoded_question)

    # Let's use it to encode the question:

    video_question_input = Input(shape=(100,), dtype='int32')

    encoded_video_question = question_encoder(video_question_input)

    # And this is our video question answering model:

    merged = keras.layers.concatenate([encoded_video, encoded_video_question])

    output = Dense(1000, activation='softmax')(merged)

    video_qa_model = Model(inputs=[video_input, video_question_input], outputs=output)

  • 相关阅读:
    手打的table
    高质量程序设计指南C/C++语言——C++/C程序设计入门(2)
    高质量程序设计指南C/C++语言——C++/C程序设计入门
    int *p = NULL 和 *p = NULL(转载)
    C语言深度剖析---预处理(define)(转载)
    C语言--union关键字(转载)
    C语言深度剖析--volatile(转载)
    C语言深度剖析---const关键字(转载)
    C语言循环剖析(转载)
    main函数的参数问题 (转载)
  • 原文地址:https://www.cnblogs.com/yongfuxue/p/10095905.html
Copyright © 2011-2022 走看看