各种报错

zoukankan html css js c++ java

各种报错
1.return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

torch.nn.Embedding类要求该class的输入input必须是LongTensor
1 embedding = nn.Embedding(10, 3, padding_idx=0) 2 input = torch.LongTensor([[0,2,0,5]]) 3 embedding(input) 4 #tensor([[[ 0.0000, 0.0000, 0.0000], 5 # [ 0.1535, -2.0309, 0.9315], 6 # [ 0.0000, 0.0000, 0.0000], 7 # [-0.1655, 0.9897, 0.0635]]])
2.关于安装Spacy以及de和en模块遇到的坑

https://www.pythonheidong.com/blog/article/233961/aff07bd143f34d50f1fa/

3.RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [4, 20]] is at version 21; expected version 20 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

loss.backward()报错说某个变量在前向计算时是一个值，在求梯度时变成了另一个值（inplace 操作导致），使得pytorch在反向梯度求导时产生了错误，这个变量不会在报错时指明，需要分析程序找出；

常见的解决方案可以把原地操作改成非原地操作（我没有成功），或者切断反向传播；

把训练过程中值会变化的变量加.data；

关于pytorch..detach() 和 .data用于切断反向传播，参考链接https://www.cnblogs.com/wanghui-garcia/p/10677071.html

4.在读取其他文件夹下的.py文件时，报错ModuleNotFoundError: No module named '文件夹'

假设有文件夹A和B，A中有文件a.py和aa.py，B中有文件b.py，（两个文件夹中都要有__init__.py函数）

当a.py想要调用b.py中的函数XX，from B.b import XX（不在同一文件夹）

当a.py想要调用aa.py中的函数，from .aa import XX（在同一文件夹，要加.）

实际中的例子：

Tip：在程序最上面查看当前运行目录print(os.getcwd())

5.巨坑！！！GPU下训练函数在遍历train_iterator时陷入死循环（而迭代器本身长度为5），但在CPU下能正常运行

问题出在使用torch.data.BucketIterator.splits来构建迭代器时没有指定参数repeat=False
1 #创建iterator 2 train_iterator, test_iterator = data.BucketIterator.splits( 3 (train_data, test_data), 4 batch_size=BATCH_SIZE, 5 sort_within_batch=False, 6 repeat=False, 7 sort = False)
另外在本身文件和调用的文件中，不能重复占用不同的GPU，即下面的语句（GPU序号不同）不能同时出现在调用和被调用的文件开头；

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

6.[:, None, None]
1 import torch 2 3 B = 16 4 print(torch.arange(B)[:, None, None].shape) #[16, 1, 1] 5 print(torch.arange(B)[None, :, None].shape) #[1, 16, 1]
7.模型的保存和载入

情况一：保存和载入网络的结构
1 torch.save(net, "ney.pkl") 2 3 net1 = torch.load("net.pkl")
情况二：保存和载入网络的参数
1 torch.save(net.state_dict(), "net_params.pkl") 2 3 net2.load_state_dict(torch.load("net_params.pkl"), strict=False)
8.linux环境下安装pytorch（对应cuda9.0，学校服务器的08节点）

使用清华源，老是忘记，在此记录一下
1 conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0
9.计算模型参数量和FLOPs
1 #计算参数量，这里的参数量是指参数通过模型经过的参数量 2 inputs = (imgs, labels, fast) 3 flops, params = profile(net, inputs, verbose=False) 4 total_flops += flops 5 model_params_num += params 6 7 loss, logits = net(imgs, labels, fast)
Tip：inputs里面的参数个数和net要传入的参数要一致；

　　如果只有一个imgs参数，inputs = (imgs, )，逗号不能少，需要传入的是元组类型，不加逗号就相当于imgs
1 #真实的模型参数 2 num_params = (sum(p.numel() for p in model.parameters())/1000000.0) 3 logging.info('student model Total params: %.2fM' % num_params)
10.多标签分类的衡量指标

下面以20分类为例，如果是最后一个维度是2，看成20个2分类，即shape=[8, 20, 2]，可以直接根据二分类判别是否是当前该类，而不用先sigmoid再和0.5比较；
1 #多分类正确率 2 def cal_multi_to_single_acc(predict_y, true_y): #predict_y.shape=[8, 20], true_y.shape=[8, 20]，类型都是numpy 3 predict_y = torch.from_numpy(predict_y) 4 predict_y = torch.sigmoid(predict_y) #先把值压缩到0~1之间 5 preds = [] 6 for line in predict_y: 7 8 preds_line = [0 for i in range(predict_y.size(1))] 9 for i in range(len(line)): 10 prob = float(line[i]) 11 if prob > 0.5: 12 preds_line[i] = 1 13 else: 14 preds_line[i] = 0 15 preds.append(preds_line) #preds是列表类型 16 17 acc = 0 18 for y_test, y_pred in zip(true_y, preds): 19 if list(y_test) == y_pred: 20 acc += 1 21 print("acc", acc / len(true_y)) 22 preds = np.array(preds) 23 true_y = np.array(true_y) 24 25 print("Test Precision, Recall and F1-Score...") 26 print(classification_report(true_y, preds, digits=4))
11.多GPU运行
查看全文

相关阅读:
Linux-KVM, QEMU, Virtualbox, VMWare
Linux环境下查看CPU是否支持VT虚拟化
 (OK) Phoronix Test Suite
GNS3: Qemu or VirtualBox?
Android x86 On Qemu
ARC Welder——Android 模拟器——App Runtime for Chrome
Running Android apps using Docker and ARC Welder
使用Docker编译Android(AOSP)
Android —— releases of AOSP
(OK) install android-x86-5.1 on virtualbox

原文地址：https://www.cnblogs.com/cxq1126/p/14299685.html

1.return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

2.关于安装Spacy以及de和en模块遇到的坑

4.在读取其他文件夹下的.py文件时，报错ModuleNotFoundError: No module named '文件夹'

5.巨坑！！！GPU下训练函数在遍历train_iterator时陷入死循环（而迭代器本身长度为5），但在CPU下能正常运行

6.[:, None, None]

7.模型的保存和载入

8.linux环境下安装pytorch（对应cuda9.0，学校服务器的08节点）

9.计算模型参数量和FLOPs

10.多标签分类的衡量指标

11.多GPU运行