sample算子通常用于抽样,是一个transformation算子
参数:withReplacement=true代表有放回抽样
参数:fraction 代表抽样的比例
使用:
data.sample(withReplacement=true,fraction = 0.5).collect().foreach(println(_))
源码:
def sample( withReplacement: Boolean, fraction: Double, seed: Long = Utils.random.nextLong): RDD[T] = { require(fraction >= 0, s"Fraction must be nonnegative, but got ${fraction}") withScope { require(fraction >= 0.0, "Negative fraction value: " + fraction) if (withReplacement) { new PartitionwiseSampledRDD[T, T](this, new PoissonSampler[T](fraction), true, seed) } else { new PartitionwiseSampledRDD[T, T](this, new BernoulliSampler[T](fraction), true, seed) } } }