zoukankan      html  css  js  c++  java
  • PyTorch学习(一)

    PyTorch官网
    PyTorch官方教程
    PyTorch官方文档
    PyTorch中文文档/教程
    动手学深度学习PyTorch版

    引言

    做了一个小测试,发现在cpu上pytorch比tensorflow快很多。另外还发现,conda命令安装的tensorflow比pip安装的要快,pytorch则没有明显区别,之前就看到有人说conda中的tensorflow经过了优化,看来是真的。

    寻找下面函数的最小值:

    conda:

    import torch
    import tensorflow as tf
    import time
    import numpy as np
    
    def himmelblau(x):
        return (x[0]**2 + x[1] - 11)**2 + (x[0] + x[1]**2 - 7)**2
    
    import plotly.graph_objects as go
    x = np.arange(-6, 6, 0.1)
    y = np.arange(-6, 6, 0.1)
    # print('x,y range:', x.shape, y.shape)
    X, Y = np.meshgrid(x, y)
    fig = go.Figure(data=go.Surface(z=himmelblau([X,Y])))
    fig.write_image('figure2.svg')
    fig.write_html('first_figure.html', auto_open=True)
    
    
    tic = time.time()
    x = torch.tensor([0., 0.], requires_grad=True)
    optimizer = torch.optim.Adam([x], lr=1e-3)
    for step in range(20000):
        pred = himmelblau(x)
        optimizer.zero_grad() # 梯度信息清零
        pred.backward()
        optimizer.step() # 每调用一次step,就更新一次: x' = x, y' = y
        
        if step % 2000 == 0:
            print('step{}: x = {}, f(x) = {}'.format(step, x.detach().numpy(), pred.item()))
    toc = time.time()
    print('time:',toc-tic)
    
    tic = time.time()
    x = tf.Variable([0., 0.])  # 传入GradientTape计算梯度的必须是tf.Variable类型
    optimizer = tf.optimizers.Adam(lr=1e-3)
    for step in range(20000):
        with tf.GradientTape() as tape:
            tape.watch([x])
            pred = himmelblau(x)
            
        grads = tape.gradient(pred, [x])
        optimizer.apply_gradients(zip(grads, [x])) # 和pytorch不同,tf是将所有梯度信息存起来一次性更新
        # x -= 0.001*grads
        
        if step % 2000 == 0:
            print('step{}: x = {}, f(x) = {}'.format(step, x.numpy(), pred.numpy()))
    toc = time.time()
    print('time:',toc-tic)
    

    conda版:

    step0: x = [0.001 0.001], f(x) = 170.0
    step2000: x = [2.3331807 1.9540695], f(x) = 13.730916023254395
    step4000: x = [2.982008 2.0270984], f(x) = 0.014858869835734367
    step6000: x = [2.9999835 2.0000222], f(x) = 1.1074007488787174e-08
    step8000: x = [2.9999938 2.0000083], f(x) = 1.5572823031106964e-09
    step10000: x = [2.9999979 2.0000029], f(x) = 1.8189894035458565e-10
    step12000: x = [2.9999993 2.000001 ], f(x) = 1.6370904631912708e-11
    step14000: x = [2.9999998 2.0000002], f(x) = 1.8189894035458565e-12
    step16000: x = [3. 2.], f(x) = 0.0
    step18000: x = [3. 2.], f(x) = 0.0
    time: 8.470422983169556
    step0: x = [0.001 0.001], f(x) = 170.0
    step2000: x = [2.3331852 1.9540718], f(x) = 13.730728149414062
    step4000: x = [2.9820085 2.0270977], f(x) = 0.01485812570899725
    step6000: x = [2.9999835 2.0000222], f(x) = 1.1074007488787174e-08
    step8000: x = [2.9999938 2.0000083], f(x) = 1.5572823031106964e-09
    step10000: x = [2.9999979 2.0000029], f(x) = 1.8189894035458565e-10
    step12000: x = [2.9999995 2.0000007], f(x) = 9.322320693172514e-12
    step14000: x = [3. 2.0000002], f(x) = 9.094947017729282e-13
    step16000: x = [3. 2.], f(x) = 0.0
    step18000: x = [3. 2.], f(x) = 0.0
    time: 43.112674951553345

    pip版:

    step0: x = [0.001 0.001], f(x) = 170.0
    step2000: x = [2.3331807 1.9540695], f(x) = 13.730916023254395
    step4000: x = [2.982008 2.0270984], f(x) = 0.014858869835734367
    step6000: x = [2.9999835 2.0000222], f(x) = 1.1074007488787174e-08
    step8000: x = [2.9999938 2.0000083], f(x) = 1.5572823031106964e-09
    step10000: x = [2.9999979 2.0000029], f(x) = 1.8189894035458565e-10
    step12000: x = [2.9999993 2.000001 ], f(x) = 1.6370904631912708e-11
    step14000: x = [2.9999998 2.0000002], f(x) = 1.8189894035458565e-12
    step16000: x = [3. 2.], f(x) = 0.0
    step18000: x = [3. 2.], f(x) = 0.0
    time: 8.337981462478638
    step0: x = [0.001 0.001], f(x) = 170.0
    step2000: x = [2.3331852 1.9540718], f(x) = 13.730728149414062
    step4000: x = [2.9820085 2.0270977], f(x) = 0.01485812570899725
    step6000: x = [2.9999835 2.0000222], f(x) = 1.1074007488787174e-08
    step8000: x = [2.9999938 2.0000083], f(x) = 1.5572823031106964e-09
    step10000: x = [2.9999979 2.0000029], f(x) = 1.8189894035458565e-10
    step12000: x = [2.9999995 2.0000007], f(x) = 9.322320693172514e-12
    step14000: x = [3. 2.0000002], f(x) = 9.094947017729282e-13
    step16000: x = [3. 2.], f(x) = 0.0
    step18000: x = [3. 2.], f(x) = 0.0
    time: 54.814427614212036

    安装

    新建环境:

    conda create --name torch python=3.7
    

    安装一些可能要用到的包(非必须,看自己情况):

    conda install numpy
    conda install spyder
    conda install jupyter notebook
    

    安装PyTorch:

    conda install pytorch torchvision cpuonly -c pytorch # CPU版
    

    GPU版根据CUDA版本不同命令也不同,可以去这里查看安装命令

    自动求导

    requires_grad

    设置张量的属性 .requires_gradTrue,那么它将会追踪对于该张量的所有操作。

    import torch
    import torch.nn.functional as F
    
    x = torch.ones(1)
    w = torch.full([1],2)  # 应为w = torch.full([1], 2, requires_grad=True)
    mse = F.mse_loss(x, x+w)
    torch.autograd.grad(mse, [w])
    

    RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

    PyTorch的图是静态创建的,如果使用requires_grad_()方法:

    x = torch.ones(1)
    w = torch.full([1], 2)
    mse = F.mse_loss(x, x+w)
    w.requires_grad_()
    torch.autograd.grad(mse, [w])
    

    依然会报错:

    RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

    因为mse这张图已经创建好了,因此要重新创建一次,或者将mse的创建放到w.requires_grad_(True)后面:

    x = torch.ones(1)
    w = torch.full([1], 2)
    w.requires_grad_()
    mse = F.mse_loss(x, x+w)
    grad = torch.autograd.grad(mse, [w])  # 返回一个列表,分别是对每个变量的梯度
    print(grad)
    

    (tensor([4.]),)

    backward()

    也可以通过调用 .backward(),来自动计算所有的梯度。这个张量的所有梯度将会自动累加到.grad属性.

    x = torch.ones(1)
    w = torch.full([1], 2)
    w.requires_grad_()
    mse = F.mse_loss(x, x+w)
    # grad = torch.autograd.grad(mse, [w]) 和下面语句等价
    mse.backward()   # 不返回值,而是把梯度附加在每个变量的grad属性
    print(w.grad)
    

    tensor([4.])

    调用完backward()后,pytorch会把图的信息清除掉,当再次调用backward(),会报错:

    RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

    要保持图的信息,可设置retain_graph=True

    torch.autograd.grad(mse, [w], retain_graph=True)
    或
    mse.backward(retain_graph=True)
    

    detach()

    要阻止一个张量被跟踪历史,可以调用 .detach() 方法将其与计算历史分离,并阻止它未来的计算记录被跟踪。

    为了防止跟踪历史记录(和使用内存),可以将代码块包装在 with torch.no_grad(): 中。在评估模型时特别有用,因为模型可能具有 requires_grad = True 的可训练的参数,但是我们不需要在此过程中对他们进行梯度计算。

  • 相关阅读:
    关于排序--sort()和qsort()使用
    UVA--147 Dollars(完全背包)
    UVA--674 Coin Change(完全背包)
    HDU--1203 I NEED A OFFER!(01背包)
    编程中关于无穷大的设定技巧
    HDU--2126 Buy the souvenirs(二维01背包)
    HDU--2639 Bone Collector II(01背包)
    MySQL中的group_concat函数
    Redis监控
    JAVA中的代理模式
  • 原文地址:https://www.cnblogs.com/pengweii/p/12722503.html
Copyright © 2011-2022 走看看