zoukankan      html  css  js  c++  java
  • BindsNET Breakout源码分析

    BindsNET:https://github.com/BindsNET/bindsnet

    相关代码:bindsnet/examples/breakout/

    部分代码仅供演示,参见https://github.com/BindsNET/bindsnet/issues/345

    1. breakout.py(演示代码,学习过程中不存在权重更新)

    from bindsnet.network import Network
    from bindsnet.pipeline import EnvironmentPipeline
    from bindsnet.encoding import bernoulli
    from bindsnet.network.topology import Connection
    from bindsnet.environment import GymEnvironment
    from bindsnet.network.nodes import Input, IzhikevichNodes
    from bindsnet.pipeline.action import select_softmax
    
    # Build network.
    network = Network(dt=1.0)
    
    # Layers of neurons.
    inpt = Input(n=80 * 80, shape=[1, 1, 1, 80, 80], traces=True)
    middle = IzhikevichNodes(n=100, traces=True)
    out = IzhikevichNodes(n=4, refrac=0, traces=True)
    
    # Connections between layers.
    inpt_middle = Connection(source=inpt, target=middle, wmin=0, wmax=1)
    middle_out = Connection(source=middle, target=out, wmin=0, wmax=1)
    
    # Add all layers and connections to the network.
    network.add_layer(inpt, name="Input Layer")
    network.add_layer(middle, name="Hidden Layer")
    network.add_layer(out, name="Output Layer")
    network.add_connection(inpt_middle, source="Input Layer", target="Hidden Layer")
    network.add_connection(middle_out, source="Hidden Layer", target="Output Layer")
    
    # Load the Breakout environment.
    environment = GymEnvironment("BreakoutDeterministic-v4")
    environment.reset()
    
    # Build pipeline from specified components.
    pipeline = EnvironmentPipeline(
        network,
        environment,
        encoding=bernoulli,
        action_function=select_softmax,
        output="Output Layer",
        time=100,
        history_length=1,
        delta=1,
        plot_interval=1,
        render_interval=1,
    )
    
    # Run environment simulation for 100 episodes.
    for i in range(100):
        total_reward = 0
        pipeline.reset_state_variables()
        is_done = False
        while not is_done:
            result = pipeline.env_step()
            pipeline.step(result)
    
            reward = result[1]
            total_reward += reward
    
            is_done = result[2]
        print(f"Episode {i} total reward:{total_reward}")
    View Code
    # Build network.
    network = Network(dt=1.0) # param dt: Simulation timestep.

    bindsnet.network.Network对象是BindsNET的主要工作。它负责协调其所有组成部分的仿真:神经元,突触,学习规则等。其中dt参数指定仿真时间步长,该步长决定要解决仿真的时间粒度(以毫秒为单位)。为了简化计算,所有仿真均使用Euler方法进行。如果在仿真中遇到不稳定性,请使用较小的dt来解决数值不稳定性。

    # Layers of neurons.

    '''  

       :param n: The number of neurons in the layer.

       :param shape: The dimensionality of the layer.  

       :param traces: Whether to record decaying spike traces.  

       :param refrac: Refractory (non-firing) period of the neuron.

    '''
    inpt = Input(n=80 * 80, shape=[1, 1, 1, 80, 80], traces=True) middle = IzhikevichNodes(n=100, traces=True) out = IzhikevichNodes(n=4, refrac=0, traces=True)

    指定具有用户指定的脉冲行为的节点层(Input)以及Izhikevich神经元的层(Middle, Output)。

    # Connections between layers.

    '''

       :param source: A layer of nodes from which the connection originates.

       :param target: A layer of nodes to which the connection connects.

       :param float wmin: Minimum allowed value on the connection weights.

       :param float wmax: Maximum allowed value on the connection weights.

    '''
    inpt_middle = Connection(source=inpt, target=middle, wmin=0, wmax=1) middle_out = Connection(source=middle, target=out, wmin=0, wmax=1)

    指定神经元群体之间的突触(Input-Middle, Middle-Output),默认update_rule = NoOp。

    # Add all layers and connections to the network.
    network.add_layer(inpt, name="Input Layer")
    network.add_layer(middle, name="Hidden Layer")
    network.add_layer(out, name="Output Layer")
    network.add_connection(inpt_middle, source="Input Layer", target="Hidden Layer")
    network.add_connection(middle_out, source="Hidden Layer", target="Output Layer")

    向网络添加一层节点,并在网络的节点层之间添加连接。

    # Load the Breakout environment.
    environment = GymEnvironment("BreakoutDeterministic-v4")
    environment.reset()

    OpenAI gym环境的包装器。

    # Build pipeline from specified components.
    pipeline = EnvironmentPipeline(
        network,
        environment,
        encoding=bernoulli, # 脉冲编码方式
        action_function=select_softmax,
        output="Output Layer",
        time=100, # 仿真时长
        history_length=1, # 需要跟踪的观察数(从源码分析,该项应该列入GymEnvironment中)
        delta=1, # 在历史记录中保存观察值的步长(从源码分析,该项应该列入GymEnvironment中)
        plot_interval=1,
        render_interval=1,
    )

    EnvironmentPipeline抽象了"网络"、"环境"和环境反馈动作之间的交互。

    # Run environment simulation for 100 episodes.
    for i in range(100):
        total_reward = 0
        pipeline.reset_state_variables()
        is_done = False
        while not is_done:
            result = pipeline.env_step()
            pipeline.step(result)
    
            reward = result[1]
            total_reward += reward
    
            is_done = result[2]
        print(f"Episode {i} total reward:{total_reward}")

    pipeline.env_step():环境的一个步骤,包括渲染、获取和执行动作,以及累积/延迟奖励。 

    pipeline.step():任何高水平流水线的单步操作。

    • EnvironmentPipeline (step_):运行网络的单个迭代,完成后更新它和奖励列表。
    • Network (run):仿真给定输入和时间的网络。

    2. breakout_stdp.py(演示代码,前一个连接不存在权重更新,后一个连接基于R-STDP进行权重更新)

    from bindsnet.network import Network
    from bindsnet.pipeline import EnvironmentPipeline
    from bindsnet.learning import MSTDP
    from bindsnet.encoding import bernoulli
    from bindsnet.network.topology import Connection
    from bindsnet.environment import GymEnvironment
    from bindsnet.network.nodes import Input, LIFNodes
    from bindsnet.pipeline.action import select_softmax
    
    # Build network.
    network = Network(dt=1.0)
    
    # Layers of neurons.
    inpt = Input(n=80 * 80, shape=[1, 1, 1, 80, 80], traces=True)
    middle = LIFNodes(n=100, traces=True)
    out = LIFNodes(n=4, refrac=0, traces=True)
    
    # Connections between layers.
    inpt_middle = Connection(source=inpt, target=middle, wmin=0, wmax=1e-1)
    middle_out = Connection(
        source=middle,
        target=out,
        wmin=0,
        wmax=1,
        update_rule=MSTDP,
        nu=1e-1,
        norm=0.5 * middle.n,
    )
    
    # Add all layers and connections to the network.
    network.add_layer(inpt, name="Input Layer")
    network.add_layer(middle, name="Hidden Layer")
    network.add_layer(out, name="Output Layer")
    network.add_connection(inpt_middle, source="Input Layer", target="Hidden Layer")
    network.add_connection(middle_out, source="Hidden Layer", target="Output Layer")
    
    # Load the Breakout environment.
    environment = GymEnvironment("BreakoutDeterministic-v4")
    environment.reset()
    
    # Build pipeline from specified components.
    environment_pipeline = EnvironmentPipeline(
        network,
        environment,
        encoding=bernoulli,
        action_function=select_softmax,
        output="Output Layer",
        time=100,
        history_length=1,
        delta=1,
        plot_interval=1,
        render_interval=1,
    )
    
    
    def run_pipeline(pipeline, episode_count):
        for i in range(episode_count):
            total_reward = 0
            pipeline.reset_state_variables()
            is_done = False
            while not is_done:
                result = pipeline.env_step()
                pipeline.step(result)
    
                reward = result[1]
                total_reward += reward
    
                is_done = result[2]
            print(f"Episode {i} total reward:{total_reward}")
    
    
    print("Training: ")
    run_pipeline(environment_pipeline, episode_count=100)
    
    # stop MSTDP
    environment_pipeline.network.learning = False
    
    print("Testing: ")
    run_pipeline(environment_pipeline, episode_count=100)
    View Code

    运行结果参见:https://github.com/BindsNET/bindsnet/issues/345

      代码breakout_stdp.py演示了如何使用脉冲网络玩ATARI游戏。最后的低奖励结果是正常的,并且可以从未经训练的脉冲网络的随机选择中得到预期。当前,训练脉冲神经元在RL环境中表现良好并非易事。但是,在Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to Atari Breakout game中,我们展示了一种训练规则神经元网络并将其转换为脉冲网络的方法。此外,Strategy and Benchmark for Converting Deep Q-Networks to Event-Driven Spiking Neural Networks这篇最新的论文也用到BindsNET实现ANN-SNN转换。

    3. play_breakout_from_ANN.py

    import argparse
    from tqdm import tqdm
    import torch.nn as nn
    import torch.nn.functional as F
    import torch
    
    
    from bindsnet.network import Network
    from bindsnet.pipeline import EnvironmentPipeline
    from bindsnet.encoding import bernoulli, poisson
    from bindsnet.network.topology import Connection
    from bindsnet.environment import GymEnvironment
    from bindsnet.network.nodes import Input, LIFNodes, IzhikevichNodes, IFNodes
    from bindsnet.pipeline.action import *
    
    from bindsnet.network.nodes import Nodes, AbstractInput
    from typing import Iterable, Optional, Union
    
    parser = argparse.ArgumentParser(prefix_chars="@")
    parser.add_argument("@@seed", type=int, default=42)
    parser.add_argument("@@dt", type=float, default=1.0)
    parser.add_argument("@@gpu", dest="gpu", action="store_true")
    parser.add_argument("@@layer1scale", dest="layer1scale", type=float, default=57.68)
    parser.add_argument("@@layer2scale", dest="layer2scale", type=float, default=77.48)
    parser.add_argument("@@num_episodes", type=int, default=10)
    parser.add_argument("@@plot_interval", type=int, default=1)
    parser.add_argument("@@rander_interval", type=int, default=1)
    parser.set_defaults(plot=False, render=False, gpu=True, probabilistic=False)
    locals().update(vars(parser.parse_args()))
    
    # Setup PyTorch computing device
    device = torch.device("cuda" if torch.cuda.is_available() and gpu else "cpu")
    torch.random.manual_seed(seed)
    
    
    # Build ANN
    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.fc1 = nn.Linear(6400, 1000)
            self.fc2 = nn.Linear(1000, 4)
    
        def forward(self, x):
            x = F.relu(self.fc1(x))
            x = self.fc2(x)
            return x
    
    
    # load ANN
    dqn_network = torch.load("trained_shallow_ANN.pt", map_location=device)
    
    # Build Spiking network.
    network = Network(dt=dt).to(device)
    
    # Layers of neurons.
    inpt = Input(n=6400, traces=False)  # Input layer
    middle = LIFNodes(
        n=1000, refrac=0, traces=True, thresh=-52.0, rest=-65.0
    )  # Hidden layer
    readout = LIFNodes(
        n=4, refrac=0, traces=True, thresh=-52.0, rest=-65.0
    )  # Readout layer
    layers = {"X": inpt, "M": middle, "R": readout}
    
    # Set the connections between layers with the values set by the ANN
    # Input -> hidden.
    inpt_middle = Connection(
        source=layers["X"],
        target=layers["M"],
        w=torch.transpose(dqn_network.fc1.weight, 0, 1) * layer1scale,
    )
    # hidden -> readout.
    middle_out = Connection(
        source=layers["M"],
        target=layers["R"],
        w=torch.transpose(dqn_network.fc2.weight, 0, 1) * layer2scale,
    )
    
    # Add all layers and connections to the network.
    network.add_layer(inpt, name="Input Layer")
    network.add_layer(middle, name="Hidden Layer")
    network.add_layer(readout, name="Output Layer")
    network.add_connection(inpt_middle, source="Input Layer", target="Hidden Layer")
    network.add_connection(middle_out, source="Hidden Layer", target="Output Layer")
    
    # Load the Breakout environment.
    environment = GymEnvironment("BreakoutDeterministic-v4")
    environment.reset()
    
    # Build pipeline from specified components.
    pipeline = EnvironmentPipeline(
        network,
        environment,
        encoding=poisson,
        encode_factor=50,
        action_function=select_highest,
        percent_of_random_action=0.05,
        random_action_after=5,
        output="Output Layer",
        reset_output_spikes=True,
        time=500,
        overlay_input=4,
        history_length=1,
        plot_interval=plot_interval if plot else None,
        render_interval=render_interval if render else None,
        device=device,
    )
    
    # Run environment simulation for number of episodes.
    for i in tqdm(range(num_episodes)):
        total_reward = 0
        pipeline.reset_state_variables()
        is_done = False
        pipeline.env.step(1)  # start with fire the ball
        pipeline.env.step(1)  # start with fire the ball
        while not is_done:
            result = pipeline.env_step()
            pipeline.step(result)
    
            reward = result[1]
            total_reward += reward
    
            is_done = result[2]
        tqdm.write(f"Episode {i} total reward:{total_reward}")
        with open("play-breakout_results.csv", "a") as myfile:
            myfile.write(f"{i},{layer1scale},{layer2scale},{total_reward}
    ")
    View Code

    论文中展示了一种训练规则神经元网络并将其转换为脉冲网络的方法,但代码中不包括搜索最优缩放参数的方法(例如PSO)。

    4. random_baseline.py 

    import os
    import argparse
    import numpy as np
    
    from bindsnet.environment import GymEnvironment
    
    parser = argparse.ArgumentParser()
    parser.add_argument("-n", type=int, default=1000000)
    parser.add_argument("--render", dest="render", action="store_true")
    parser.set_defaults(render=False)
    
    args = parser.parse_args()
    
    n = args.n
    render = args.render
    
    # Load Breakout environment.
    env = GymEnvironment("BreakoutDeterministic-v4")
    env.reset()
    
    total = 0
    rewards = []
    avg_rewards = []
    lengths = []
    avg_lengths = []
    
    i, j, k = 0, 0, 0
    while i < n:
        if render:
            env.render()
    
        # Select random action.
        a = np.random.choice(4)
    
        # Step environment with random action.
        obs, reward, done, info = env.step(a)
    
        total += reward
    
        rewards.append(reward)
        if i == 0:
            avg_rewards.append(reward)
        else:
            avg = (avg_rewards[-1] * (i - 1)) / i + reward / i
            avg_rewards.append(avg)
    
        if i % 100 == 0:
            print(
                "Iteration %d: last reward: %.2f, average reward: %.2f"
                % (i, reward, avg_rewards[-1])
            )
    
        if done:
            # Restart game if out of lives.
            env.reset()
    
            length = i - j
            lengths.append(length)
            if j == 0:
                avg_lengths.append(length)
            else:
                avg = (avg_lengths[-1] * (k - 1)) / k + length / k
                avg_lengths.append(avg)
    
            print(
                "Episode %d: last length: %.2f, average length: %.2f"
                % (k, length, avg_lengths[-1])
            )
    
            j += length
            k += 1
    
        i += 1
    View Code

    5. random_network_baseline.py 

    import os
    import argparse
    import numpy as np
    
    from bindsnet.environment import GymEnvironment
    
    parser = argparse.ArgumentParser()
    parser.add_argument("-n", type=int, default=1000000)
    parser.add_argument("--render", dest="render", action="store_true")
    parser.set_defaults(render=False)
    
    args = parser.parse_args()
    
    n = args.n
    render = args.render
    
    # Load Breakout environment.
    env = GymEnvironment("BreakoutDeterministic-v4")
    env.reset()
    
    total = 0
    rewards = []
    avg_rewards = []
    lengths = []
    avg_lengths = []
    
    i, j, k = 0, 0, 0
    while i < n:
        if render:
            env.render()
    
        # Select random action.
        a = np.random.choice(4)
    
        # Step environment with random action.
        obs, reward, done, info = env.step(a)
    
        total += reward
    
        rewards.append(reward)
        if i == 0:
            avg_rewards.append(reward)
        else:
            avg = (avg_rewards[-1] * (i - 1)) / i + reward / i
            avg_rewards.append(avg)
    
        if i % 100 == 0:
            print(
                "Iteration %d: last reward: %.2f, average reward: %.2f"
                % (i, reward, avg_rewards[-1])
            )
    
        if done:
            # Restart game if out of lives.
            env.reset()
    
            length = i - j
            lengths.append(length)
            if j == 0:
                avg_lengths.append(length)
            else:
                avg = (avg_lengths[-1] * (k - 1)) / k + length / k
                avg_lengths.append(avg)
    
            print(
                "Episode %d: last length: %.2f, average length: %.2f"
                % (k, length, avg_lengths[-1])
            )
    
            j += length
            k += 1
    
        i += 1
    View Code

    6. trained_shallow_ANN.pt

  • 相关阅读:
    Daily Scrum 10.24
    Daily Srum 10.22
    TFS的使用
    Daily Srum 10.21
    Scrum Meeting 报告
    团队博客作业Week4 --- 学霸网站--NABC
    团队博客作业Week3 --- 项目选择&&需求疑问
    团队博客作业Week2 --- 学长学姐访谈录
    团队博客作业Week1 --- 团队成员简介
    js将数组中一个或多个字段相同的子元素中合并
  • 原文地址:https://www.cnblogs.com/lucifer1997/p/14293310.html
Copyright © 2011-2022 走看看