官网:https://pytorch.org/docs/stable/data.html?highlight=subsetrandomsampler#torch.utils.data.SubsetRandomSampler
推荐参考:https://www.sohu.com/a/291959747_197042
https://www.jianshu.com/p/a32ae0294223
https://www.cnblogs.com/marsggbo/p/10496696.html
理解一下:
DataLoader其实就是先根据sampler方法先采样,再切分出batch(比如样本有10个,SubsetRandomSampler返回一个下标,比如0到7,那么取出这8个数据,然后按照batch_size切分出一个个的batch)
实际应用:
from torch.utils.data import DataLoader
from torch.utils.data import sampler
train_data = CriteoDataset('./data', train=True) #自己定义 split_num = int(len(train_data) * 0.8) index_list = list(range(len(train_data))) train_idx, valid_idx = index_list[:split_num], index_list[split_num:] tr_sampler = sampler.SubsetRandomSampler(train_idx) val_sampler = sampler.SubsetRandomSampler(valid_idx) loader_train = DataLoader(train_data, batch_size=100, sampler=tr_sampler) loader_val = DataLoader(val_data, batch_size=100, sampler=val_sampler)