Breaking Linear Classifiers on ImageNet

zoukankan html css js c++ java

Breaking Linear Classifiers on ImageNet
转载自：http://karpathy.github.io/2015/03/30/breaking-convnets/

Breaking Linear Classifiers on ImageNet

在ImageNet上打破线性分类器

Mar 30, 2015

You’ve probably heard that Convolutional Networks work very well in practice and across a wide range of visual recognition problems. You may have also read articles and papers that claim to reach a near “human-level performance”. There are all kinds of caveats to that (e.g. see my G+ post on Human Accuracy is not a point, it lives on a tradeoff curve), but that is not the point of this post. I do think that these systems now work extremely well across many visual recognition tasks, especially ones that can be posed as simple classification.

您可能听说过Convolutional Networks在实践中以及广泛的视觉识别问题上都能很好地工作。您可能还阅读了声称达到接近“人类级别表现”的文章和论文。有各种各样的警告（例如，参见我关于人类准确性的G +帖子不是一个点，它存在于权衡曲线上），但这不是这篇文章的重点。我认为这些系统现在在许多视觉识别任务中都能很好地工作，特别是那些可以作为简单分类的系统。

Yet, a second group of seemingly baffling results has emerged that brings up an apparent contradiction. I’m referring to several people who have noticed that it is possible to take an image that a state-of-the-art Convolutional Network thinks is one class (e.g. “panda”), and it is possible to change it almost imperceptibly to the human eye in such a way that the Convolutional Network suddenly classifies the image as any other class of choice (e.g. “gibbon”). We say that we break, or fool ConvNets. See the image below for an illustration:

然而，出现了第二组看似令人困惑的结果，这带来了明显的矛盾。我指的是几个注意到可以拍摄最先进的卷积网络认为是一类（例如“熊猫”）的图像的人，并且有可能几乎不知不觉地改变它。人眼以这样一种方式，即卷积网络突然将图像分类为任何其他类别的选择（例如“长臂猿”）。我们说我们打破或欺骗ConvNets。请参阅下图以获取说明：

This topic has recently gained attention starting with Intriguing properties of neural networks by Szegedy et al. last year. They had a very similar set of images:

最近，Szegedy等人开始关注神经网络的Intriguing属性，这个话题引起了人们的关注。去年。他们有一组非常相似的图像：

Take a correctly classified image (left image in both columns), and add a tiny distortion (middle) to fool the ConvNet with the resulting image (right).

拍摄一个正确分类的图像（两列中的左图），并添加一个微小的失真（中间）来欺骗ConvNet与生成的图像（右）。

And a set of very closely related results was later followed by Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images by Nguyen et al. Instead of starting with correctly-classified images and fooling the ConvNet, they had many more examples of performing the same process starting from noise (and hence making the ConvNet confidently classify an incomprehensible noise pattern as some class), or evolving new funny-looking images that the ConvNet is slightly too certain about:

拍摄一个正确分类的图像（两列中的左图），并添加一个微小的失真（中间）来欺骗ConvNet与生成的图像（右）。
一组非常密切相关的结果后来被深度神经网络很容易被愚弄：Nguyen等人对无法识别图像的高置信度预测。他们没有从正确分类的图像开始并愚弄ConvNet，而是从噪声开始执行相同过程的更多示例（因此使ConvNet自信地将难以理解的噪声模式归类为某些类），或者演变出新的有趣的图像 ConvNet稍微有点确定：

These images are classified with >99.6% confidence as the shown class by a Convolutional Network.

通过卷积网络将这些图像分类为具有> 99.6％置信度的所示类别。

I should make the point quickly that these results are not completely new to Computer Vision, and that some have observed the same problems even with our older features, e.g. HOG features. See Exploring the Representation Capabilities of the HOG Descriptor for details.

我应该迅速说明这些结果对于计算机视觉来说并不是全新的，并且有些人甚至使用我们的旧功能也观察到相同的问题，例如： HOG功能。有关详细信息，请参阅探索HOG描述符的表示功能。

The conclusion seems to be that we can take any arbitrary image and classify it as whatever class we want by adding tiny, imperceptible noise patterns. Worse, it was found that a reasonable fraction of fooling images generalize across different Convolutional Networks, so this isn’t some kind of fragile property of the new image or some overfitting property of the model. There’s something more general about the type of introduced noise that seems to fool many other models. In some sense, it is much more accurate to speak about fooling subspaces rather than fooling images. The latter erroneously makes them seem like tiny points in the super-high-dimensional image space, perhaps similar to rational numbers along the real numbers, when instead they are better thought of as entire intervals. Of course, this work raises security concerns because an adversary could conceivably generate a fooling image of any class on their own computer and upload it to some service with a malicious intent, with a non-zero probability of it fooling the server-side model (e.g. circumventing racy filters).

结论似乎是我们可以通过添加微小的，难以察觉的噪声模式来拍摄任意图像并将其分类为我们想要的任何类别。更糟糕的是，发现合理分数的愚弄图像在不同的卷积网络中概括，因此这不是新图像的某种脆弱属性或模型的某些过度拟合属性。引入噪声的类型更为一般，似乎愚弄了许多其他模型。从某种意义上说，谈论愚弄子空间而不是愚弄图像要准确得多。后者错误地使它们看起来像超高维图像空间中的微小点，可能类似于实数上的有理数，而相反它们更好地被认为是整个区间。当然，这项工作引起了安全问题，因为对手可能会在他们自己的计算机上生成任何类的愚蠢形象，并以恶意的意图将其上传到某个服务，其愚弄服务器端模型的概率非零（例如绕过活泼的过滤器）。

What is going on?到底是怎么回事？

These results are interesting and worrying, but they have also led to a good amount of confusion among laymen. The most important point of this entire post is the following:

这些结果既有趣又令人担忧，但也导致了外行人之间的混淆。这篇文章最重要的一点是：

These results are not specific to images, ConvNets, and they are also not a “flaw” in Deep Learning. A lot of these results were reported with ConvNets running on images because pictures are fun to look at and ConvNets are state-of-the-art, but in fact the core flaw extends to many other domains (e.g. speech recognition systems), and most importantly, also to simple, shallow, good old-fashioned Linear Classifiers (Softmax classifier, or Linear Support Vector Machines, etc.). This was pointed out and articulated in Explaining and Harnessing Adversarial Examples by Goodfellow et al. We’ll carry out a few experiments very similar to the ones presented in this paper, and see that it is in fact this linear nature that is problematic. And because Deep Learning models use linear functions to build up the architecture, they inherit their flaw. However, Deep Learning by itself is not the cause of the issue. In fact, Deep Learning offers tangible hope for a solution, since we can use all the wiggle of composed functions to design more resistant architectures or objectives.

这些结果并非特定于图像，ConvNets，它们也不是深度学习中的“缺陷”。 ConvNets在图像上运行时报告了很多这些结果，因为图片看起来很有趣而且ConvNets是最先进的，但事实上核心缺陷扩展到许多其他领域（例如语音识别系统），并且大多数重要的是，还要简单，浅，良好的老式线性分类器（Softmax分类器，或线性支持向量机等）。 Goodfellow等人在“解释和利用对抗性实例”中指出并阐明了这一点。我们将进行一些非常类似于本文中提出的实验，并且看到实际上这种线性特性存在问题。由于深度学习模型使用线性函数来构建体系结构，因此它们继承了它们的缺陷。但是，深度学习本身并不是问题的原因。实际上，深度学习为解决方案提供了切实的希望，因为我们可以使用所有组合函数的摆动来设计更具抵抗性的架构或目标。

How fooling methods work 欺骗方法如何运作

ConvNets express a differentiable function from the pixel values to class scores. For example, a ConvNet might take a 227x227 image and transforms these ~100,000 numbers through a wiggly function (parameterized by several million parameters) to 1000 numbers that we interpret as the confidences for 1000 classes (e.g. the classes of ImageNet).

ConvNets表达了从像素值到类别得分的可区分函数。例如，ConvNet可能需要一个227x227的图像，并通过一个摆动函数（由几百万个参数参数化）将这些~100,000个数字转换为1000个数字，我们将其解释为1000个类的置信度（例如ImageNet的类）。

This ConvNet takes the image of a banana and applies a function to it to transform it to class scores (here 4 classes are shown). The function consists of several rounds of convolutions where the filter entries are parameters, and a few matrix multiplications, where the elements of the matrices are parameters. A typical ConvNet might have ~100 million parameters.

这个ConvNet采用香蕉的形象并对其应用函数将其转换为类分数（这里显示了4个类）。该函数包括几轮卷积，其中滤波器条目是参数，以及一些矩阵乘法，其中矩阵的元素是参数。典型的ConvNet可能有大约1亿个参数。

We train a ConvNet with repeated process of sampling data, calculating the parameter gradients and performing a parameter update. That is, suppose we feed the ConvNet an image of a banana and compute the 1000 scores for the classes that the ConvNet assigns to this image. We then and ask the following question for every single parameter in the model:
我们通过重复采样数据，计算参数梯度和执行参数更新来训练ConvNet。也就是说，假设我们向ConvNet提供香蕉图像并计算ConvNet分配给该图像的类的1000分。然后，我们针对模型中的每个参数询问以下问题：

Normal ConvNet training: “What happens to the score of the correct class when I wiggle this parameter?”

正常的ConvNet训练：“当我摆动这个参数时，正确分类的分数会怎样？”

This wiggle influence, of course, is just the gradient. For example, some parameter in some filter in some layer of the ConvNet might get the gradient of -3.0 computed during backpropagation. That means that increasing this parameter by a tiny amount, e.g. 0.0001, would have a negative influence on the banana score (due to the negative sign); In this case, we’d expect the banana score to decrease by approximately 0.0003. Normally we take this gradient and use it to perform a parameter update, which wiggles every parameter in the model a tiny amount in the correct direction, to increase the banana score. These parameter updates hence work in concert to slightly increase the score of the banana class for that one banana image (e.g. the banana score could go up from 30% to 34% or something). We then repeat this over and over on all images in the training data.

当然，这种摆动影响只是梯度。例如，ConvNet的某个层中的某些过滤器中的某些参数可能会在反向传播期间获得-3.0的梯度。这意味着将此参数增加很少量，例如0.0001，会对香蕉评分产生负面影响（由于负号）;在这种情况下，我们预计香蕉评分会降低约0.0003。通常我们采用这个渐变并使用它来执行参数更新，这会在模型中以正确的方向摆动一小部分参数，以增加香蕉分数。因此，这些参数更新协同工作以略微增加香蕉类对于该香蕉图像的分数（例如，香蕉分数可以从30％上升至34％或其他东西）。然后，我们一遍又一遍地对训练数据中的所有图像重复此操作。

Notice how this worked: we held the input image fixed, and we wiggled the model parameters to increase the score of whatever class we wanted (e.g. banana class). It turns out that we can easily flip this process around to create fooling images. (In practice in fact, absolutely no changes to a ConvNet code base are required.) That is, we will hold the model parameters fixed, and instead we’re computing the gradient of all pixels in the input image on any class we might desire. For example, we can ask:

请注意这是如何工作的：我们将输入图像固定，然后我们摆弄模型参数以增加我们想要的任何类的分数（例如香蕉类）。事实证明，我们可以轻松地翻转这个过程来创建愚蠢的图像。（实际上，实际上不需要对ConvNet代码库进行任何更改。）也就是说，我们将固定模型参数，而是在我们可能需要的任何类上计算输入图像中所有像素的渐变。例如，我们可以问：

Creating fooling images: “What happens to the score of (whatever class you want) when I wiggle this pixel?”

创造欺骗的图像：“当我摆动这个像素时，（你想要的任何类别）得分会怎样？”

We compute the gradient just as before with backpropagation, and then we can perform an image update instead of a parameter update, with the end result being that we increase the score of whatever class we want. E.g. we can take the banana image and wiggle every pixel according to the gradient of that image on the cat class. This would change the image a tiny amount, but the score of cat would now increase. Somewhat unintuitively, it turns out that you don’t have to change the image too much to toggle the image from being classified correctly as a banana, to being classified as anything else (e.g. cat).

我们使用反向传播计算渐变，然后我们可以执行图像更新而不是参数更新，最终结果是我们增加了我们想要的任何类的得分。例如。我们可以根据cat类中图像的渐变来拍摄香蕉图像并摆动每个像素。这会使图像变化很小，但猫的分数现在会增加。有点不直观的是，事实证明你不必过多地改变图像来切换图像被正确分类为香蕉，被归类为其他任何东西（例如猫）。

In short, to create a fooling image we start from whatever image we want (an actual image, or even a noise pattern), and then use backpropagation to compute the gradient of the image pixels on any class score, and nudge it along. We may, but do not have to, repeat the process a few times. You can interpret backpropagation in this setting as using dynamic programming to compute the most damaging local perturbation to the input. Note that this process is very efficient and takes negligible time if you have access to the parameters of the ConvNet (backprop is fast), but it is possible to do this even if you do not have access to the parameters but only to the class scores at the end. In this case, it is possible to compute the data gradient numerically, or to to use other local stochastic search strategies, etc. Note that due to the latter approach, even non-differentiable classifiers (e.g. Random Forests) are not safe (but I haven’t seen anyone empirically confirm this yet).

简而言之，为了创建一个欺骗的图像，我们从我们想要的任何图像（实际图像，甚至是噪声模式）开始，然后使用反向传播来计算任何类别得分上的图像像素的渐变，并轻推它。我们可以，但不必，重复几次这个过程。您可以将此设置中的反向传播解释为使用动态编程来计算对输入的最具破坏性的局部扰动。请注意，如果您可以访问ConvNet的参数（backprop很快），此过程非常有效并且花费的时间可以忽略不计，但即使您无法访问参数但只能访问类别分数，也可以执行此操作在末尾。在这种情况下，可以用数字方式计算数据梯度，或者使用其他局部随机搜索策略等。注意，由于后一种方法，即使是非可微分类器（例如随机森林）也不安全（但我还没见过有人凭经验证实这一点。

Fooling a Linear Classifier on ImageNet 在ImageNet上欺骗线性分类器

As I mentioned before (and as described in more detail in Goodfellow et al.), it is the use of linear functions that makes our models susceptible for an attack. ConvNets, of course, do not express a linear function from images to class scores; They are a complex Deep Learning model that expresses a highly non-linear function. However, the components that make up a ConvNet are linear: Convolution of a filter with its input is a linear operation (we are sliding a filter through the input and computing dot products - a linear operation), and matrix multiplications are also a linear function.

正如我之前提到的（并且如Goodfellow等人更详细地描述的那样），使用线性函数使我们的模型易受攻击。当然，ConvNets不表示从图像到类别分数的线性函数;它们是一种复杂的深度学习模型，表达了高度非线性的功能。然而，组成ConvNet的组件是线性的：过滤器与其输入的卷积是线性操作（我们通过输入滑动滤波器并计算点积 - 线性运算），矩阵乘法也是线性函数。

So here’s a fun experiment we’ll do. Lets forget about ConvNets - they are a distracting overkill as far as the core flaw goes. Instead, lets fool a linear classifier and lets also keep with the theme of breaking models on images because they are fun to look at.

所以这是一个有趣的我们会做的实验。让我们忘记ConvNets - 就核心缺陷而言，它们是一种分散注意力的过度杀伤力。相反，让我们欺骗一个线性分类器，并让我们保持在图像上打破模型的主题，因为它们看起来很有趣。

Here is the setup:实验设置如下：
- Take 1.2 million images in ImageNet 在ImageNet中拍摄120万张图像
- Resize them to 64x64 (full-sized images would train longer)
  将它们调整为64x64（全尺寸图像将训练更长时间）
- use Caffe to train a Linear Classifier (e.g. Softmax). In other words we’re going straight from data to the classifier with a single fully-connected layer. 使用Caffe训练线性分类器（例如Softmax）。换句话说，我们将直接从数据到具有单个完全连接层的分类器。
Digression: Technical fun parts. The fun part in actually doing this is that the standard AlexNetty ConvNet hyperparameters are of course completely inadequate. For example, normally you’d use weight decay of 0.0005 or so and learning rate of 0.01, and gaussian initialization drawn from a gaussian of 0.01 std. If you’ve trained linear classifiers before on this type of high-dimensional input (64x64x3 ~= 12K numbers), you’ll know that your learning rate will probably have to be much lower, the regularization much larger, and initialization of 0.01 std will probably be inadequate. Indeed, starting Caffe training with default hyperparameters gives a starting loss of about 80, which right away tells you that the initialization is completely out of whack (initial ImageNet loss should be ballpark 7.0, which is -log(1/1000)). I scaled it down to 0.0001 std for Gaussian init which gives sensible starting loss. But then the loss right away explodes which tells you that the learning rate is way too high - I had to scale it all the way down to about 1e-7. Lastly, a weight decay of 0.0005 will give almost negligible regularization loss with 12K inputs - I had to scale it up to 100 to start getting reasonably-looking weights that aren’t super-overfitted noise blobs. It’s fun being a Neural Networks practitioner.

题外话：技术有趣的部分。实际上这样做的有趣部分是标准的AlexNetty ConvNet超参数当然是完全不合适的。例如，通常你会使用0.0005左右的重量衰减和0.01的学习率，以及从0.01标准高斯绘制的高斯初始化。如果你之前在这种类型的高维输入（64x64x3~ = 12K数字）上训练过线性分类器，你就会知道你的学习率可能要低得多，正则化要大得多，并且初始化为0.01 std可能会不够。实际上，使用默认超参数启动Caffe训练会导致大约80的开始损失，它立即告诉您初始化完全没有问题（初始ImageNet丢失应该是球场7.0，即-log（1/1000））。我把它缩小到0.0001 std的高斯初始化，这给出了合理的起始损失。但随后损失立即爆炸，告诉你学习率太高 - 我不得不将它一直缩小到大约1e-7。最后，0.0005的重量衰减将使12K输入几乎可以忽略不计的正则化损失 - 我不得不将其扩展到100以开始获得不是超级过度拟合的噪声斑点的合理重量。作为神经网络从业者很有趣。

A linear classifier over image pixels implies that every class score is computed as a dot product between all the image pixels (stretched as a large column) and a learnable weight vector, one for each class. With input images of size 64x64x3 and 1000 ImageNet classes we therefore have 64x64x3x1000 = 12.3 million weights (beefy linear model!), and 1000 biases. Training these parameters on ImageNet with a K40 GPU takes only a few tens of minutes. We can then visualize each of the learned weights by reshaping them as images:

图像像素上的线性分类器意味着每个类别分数被计算为所有图像像素（作为大列拉伸）和可学习的权重向量之间的点积，每个类对应一个。对于大小为64x64x3和1000个ImageNet类的输入图像，我们因此具有64x64x3x1000 = 1230万个权重（强大的线性模型！）和1000个偏差。使用K40 GPU在ImageNet上训练这些参数只需几十分钟。然后，我们可以通过将它们重新塑造为图像来可视化每个学习的权重：

Example linear classifiers for a few ImageNet classes. Each class' score is computed by taking a dot product between the visualized weights and the image. Hence, the weights can be thought of as a template: the images show what the classifier is looking for. For example, Granny Smith apples are green, so the linear classifier has positive weights in the green color channel and negative weights in blue and red channels, across all spatial positions. It is hence effectively counting the amount of green stuff in the middle. You can also see the learned templates for all imagenet classes for fun.

一些ImageNet类的线性分类器示例。通过在可视化权重和图像之间采用点积来计算每个类的得分。因此，权重可以被认为是模板：图像显示分类器正在寻找什么。例如，Granny Smith苹果是绿色的，因此线性分类器在绿色通道中具有正权重，在蓝色和红色通道中具有负权重，跨越所有空间位置。因此，它有效地计算了中间的绿色物品的数量。您还可以查看所有imagenet类的学习模板以获得乐趣。

By the way, I haven’t seen anyone report linear classification accuracy on ImageNet before, but it turns out to be about 3.0% top-1 accuracy (and about 10% top-5) on ImageNet. I haven’t done a completely exhaustive hyperparameter sweep but I did a few rounds of manual binary search.

顺便说一句，我之前没有看到任何人在ImageNet上报告线性分类的准确性，但在ImageNet上它的前1精度（和前5的前10％）大约是3.0％。我没有完成一个完全详尽的超参数扫描，但我做了几轮手动二分查找。

Now that we’ve trained the model parameters we can start to produce fooling images. This turns out to be quite trivial in the case of linear classifiers and no backpropagation is required. This is because when your score function is a dot product $s = w^{T} x$

现在我们已经训练了模型参数，我们可以开始制作愚弄图像。在线性分类器的情况下，这是非常微不足道的，并且不需要反向传播。这是因为当你的得分函数是点积s = wTxs = wTx时，那么图像xx上的梯度就是∇xs=w∇xs= w。也就是说，我们采用我们想要开始的图像，然后如果我们想让模型认为它是其他类（例如金鱼），我们必须采用与所需类相对应的权重，并且将这些权重的一部分添加到图像中：

Fooled linear classifier: The starting image (left) is classified as a kit fox. That's incorrect, but then what can you expect from a linear classifier? However, if we add a small amount "goldfish" weights to the image (top row, middle), suddenly the classifier is convinced that it's looking at one with high confidence. We can distort it with the school bus template instead if we wanted to. Similar figures (but on the MNIST digits dataset) can be seen in Figure 2 of Goodfellow et al.
We can also start from random noise and achieve the same effect:

虚假的线性分类器：起始图像（左）被归类为套件狐狸。这是不正确的，但是你能从线性分类器中得到什么呢？但是，如果我们在图像中添加少量“金鱼”重量（顶行，中间），突然分类器确信它正在高可信度地看着它。如果我们愿意的话，我们可以用校车模板扭曲它。类似的数字（但在MNIST数字数据集上）可以在Goodfellow等人的图2中看到。
我们也可以从随机噪声开始，达到同样的效果：

Same process but starting with a random image.

相同的过程，但从随机图像开始。

Of course, these examples are not as impactful as the ones that use a ConvNet because the ConvNet gives state of the art performance while a linear classifier barely gets to 3% accuracy, but it illustrates the point that even with a simple, shallow function it is still possible to play around with the input in imperceptible ways and get almost arbitrary results.

当然，这些示例并不像使用ConvNet那样具有影响力，因为ConvNet提供了最先进的性能，而线性分类器几乎没有达到3％的精度，但它说明了即使简单，浅的功能它仍然可以用不可察觉的方式来处理输入，并获得几乎任意的结果。

Regularization. There is one subtle comment to make regarding regularization strength. In my experiments above, increasing the regularization strength gave nicer, smoother and more diffuse weights but generalized to validation data worse than some of my best classifiers that displayed more noisy patterns. For example, the nice and smooth templates I’ve shown only achieve 1.6% accuracy. My best model that achieves 3.0% accuracy has noisier weights (as seen in the middle column of the fooling images). Another model with very low regularization reaches 2.8% and its fooling images are virtually indistinguishable yet produce 100% confidences in the wrong class. In particular:

正则。关于正规化的能力，有一个微妙的评论。在我上面的实验中，增加正则化强度可以得到更好，更平滑和更具扩散的权重，但是泛化到验证数据比我的一些显示更多噪声模式的最佳分类器更差。例如，我所展示的漂亮而流畅的模板只能达到1.6％的准确率。我的最佳模型达到3.0％的准确度具有更大的权重（如愚蠢图像的中间栏中所示）。另一个具有极低正则化的模型达到2.8％，其愚蠢的图像实际上难以区分，但在错误的等级中产生100％的置信度。特别是：
- High regularization gives smoother templates, but at some point starts to works worse. However, it is more resistant to fooling. (The fooling images look noticeably different from their original)
- Low regularization gives more noisy templates but seems to work better that all-smooth templates. It is less resistant to fooling.
Intuitively, it seems that higher regularization leads to smaller weights, which means that one must change the image more dramatically to change the score by some amount. It’s not immediately obvious if and how this conclusion translates to deeper models.

高正则化提供了更平滑的模板，但在某些时候开始变得更糟。但是，它更能抵制愚弄。（愚蠢的图像看起来与原始图像明显不同）
　　低正则化提供了更多噪声模板，但似乎比全光滑模板更好。它对愚弄的抵抗力较弱。
直观地说，似乎更高的正则化导致更小的权重，这意味着必须更大幅度地改变图像以将分数改变一些量。这个结论是否以及如何转化为更深层次的模型并不是立竿见影的。

Linear classifier with lower regularization (which leads to more noisy class weights) is easier to fool (top). Higher regularization produces more diffuse filters and is harder to fool (bottom). That is, it's harder to achieve very confident wrong answers (however, with weights so small it is hard to achieve very confident correct answers too). To flip the label to a wrong class, more visually obvious perturbations are also needed. Somewhat paradoxically, the model with the noisy weights (top) works quite a bit better on validation data (2.6% vs. 1.4% accuracy).

具有较低正则化的线性分类器（导致更多噪声类权重）更容易愚弄（顶部）。更高的正则化会产生更多的漫反射滤波器，并且更难以愚弄（底部）。也就是说，很难实现非常自信的错误答案（但是，如果权重很小，那么很难获得非常自信的正确答案）。要将标签翻转为错误的类别，还需要更明显的扰动。有点自相矛盾的是，具有噪声权重的模型（顶部）在验证数据上的效果要好得多（2.6％对准1.4％）。

Toy Example 玩具示例

We can understand this process in even more detail by condensing the problem to the smallest toy example that displays the problem. Suppose we train a binary logistic regression, where we define the probability of class 1 as $P (y = 1 ∣ x; w, b) = σ (w^{T} x + b)$

$P (y = 1 ∣ x; w, b) = σ (w^{T} x + b)$
x = [2, -1, 3, -2, 2, 2, 1, -4, 5, 1] // input w = [-1, -1, 1, -1, 1, -1, 1, 1, -1, 1] // weight vector
If you do the dot product, you get -3. Hence, probability of class 1 is 1/(1+e^(-(-3))) = 0.0474. In other words the classifier is 95% certain that this is example is class 0. We’re now going to try to fool the classifier. That is, we want to find a tiny change to x in such a way that the score comes out much higher. Since the score is computed with a dot product (multiply corresponding elements in x and w then add it all up), with a little bit of thought it’s clear what this change should be: In every dimension where the weight is positive, we want to slightly increase the input (to get slightly more score). Conversely, in every dimension where the weight is negative, we want the input to be slightly lower (again, to get slightly more score). In other words, an adversarial xad might be:

如果你做点积，你得到-3。因此，等级1的概率是1 /（1 + e ^（ - （ - 3）））= 0.0474。换句话说，分类器是95％确定这个例子是0级。我们现在将试图欺骗分类器。也就是说，我们希望找到一个微小的变化，以便得分高得多。由于分数是用点积计算的（乘以x和w中的相应元素然后将它们全部加起来），稍微想一想这个变化应该是什么：在权重为正的每个维度中，我们想要略微增加输入（获得稍高的分数）。相反，在权重为负的每个维度中，我们希望输入略微降低（再次，获得稍高的分数）。换句话说，对抗性的xad可能是：
// xad = x + 0.5w gives: xad = [1.5, -1.5, 3.5, -2.5, 2.5, 1.5, 1.5, -3.5, 4.5, 1.5]
Doing the dot product again we see that suddenly the score becomes 2. This is not surprising: There are 10 dimensions and we’ve tweaked the input by 0.5 in every dimension in such a way that we gain 0.5 in each one, adding up to a total of 5 additional score, rising it from -3 to 2. Now when we look at probability of class 1 we get 1/(1+e^(-2)) = 0.88. That is, we tweaked the original x by a small amount and we improved the class 1 probability from 5% to 88%! Moreover, notice that in this case the input only had 10 dimensions, but an image might consist of many tens of thousands of dimensions, so you can afford to make tiny changes across all of them that all add up in concert in exactly the worst way to blow up the score of any class you wish.

再次做点积，我们看到得分突然变为2。这并不奇怪：有10个维度，我们在每个维度上调整输入0.5，使得每个维度增加0.5，加起来共有5个额外得分，从-3上升到2.现在当我们看到1级的概率时，得到1 /（1 + e ^（ - 2））= 0.88。也就是说，我们将原始x调整了一小部分，我们将1级概率从5％提高到88％！此外，请注意，在这种情况下，输入只有10个维度，但图像可能包含数万个维度，因此您可以在所有这些维度上做出微小的更改，这些更改都会以最糟糕的方式完成夸大你想要的任何课程的分数。

Conclusions结论

Several other related experiments can be found in Explaining and Harnessing Adversarial Examples by Goodfellow et al. This paper is a required reading on this topic. It was the first to articulate and point out the linear functions flaw, and more generally argued that there is a tension between models that are easy to train (e.g. models that use linear functions) and models that resist adversarial perturbations.

在Goodfellow等人的Explaining and Harnessing Adversarial Examples中可以找到其他几个相关的实验。本文是关于该主题的必读书。它是第一个明确表达和指出线性函数缺陷的人，更普遍地认为，易于训练的模型（例如使用线性函数的模型）和抵抗对抗性扰动的模型之间存在张力。

As closing words for this post, the takeaway is that ConvNets still work very well in practice. Unfortunately, it seems that their competence is relatively limited to a small region around the data manifold that contains natural-looking images and distributions, and that once we artificially push images away from this manifold by computing noise patterns with backpropagation, we stumble into parts of image space where all bets are off, and where the linear functions in the network induce large subspaces of fooling inputs.

作为这篇文章的结束语，内容是ConvNets在实践中仍然运作良好。不幸的是，似乎他们的能力相对局限于包含自然图像和分布的数据流形周围的小区域，并且一旦我们通过使用反向传播计算噪声模式人为地将图像推离这个流形，我们就会偶然发现所有投注都关闭的图像空间，以及网络中的线性函数引起愚弄输入的大子空间的地方。

With wishful thinking, one might hope that ConvNets would produce all-diffuse probabilities in regions outside the training data, but there is no part in an ordinary objective (e.g. mean cross-entropy loss) that explicitly enforces this constraint. Indeed, it seems that the class scores in these regions of space are all over the place, and worse, a straight-forward attempt to patch this up by introducing a background class and iteratively adding fooling images as a new background class during training are not effective in mitigating the problem.

如果一厢情愿，人们可能希望ConvNets会在训练数据之外的区域产生全扩散概率，但是普通目标（例如平均交叉熵损失）中没有明确强制执行此约束的部分。事实上，似乎这些空间区域中的等级得分都在各地，更糟糕的是，通过引入背景类并在训练期间反复添加愚蠢图像作为新背景类来直接尝试修补它不是有效地缓解了这个问题。

It seems that to fix this problem we need to change our objectives, our forward functional forms, or even the way we optimize our models. However, as far as I know we haven’t found very good candidates for either. To be continued.

似乎要解决这个问题，我们需要改变目标，前向功能形式，甚至是我们优化模型的方式。但是，据我所知，我们还没有找到很好的候选人。未完待续。

Further Reading
进一步阅读
- Ian Goodfellow gave a talk on this work at the RE.WORK Deep Learning Summit 2015
- You can fool ConvNets as part of CS231n Assignment #3 IPython Notebook.
- IPython Notebook for this experiment. Also my Caffe linear classifier protos if you like.
comments powered by Disqus
查看全文

相关阅读:
[NOIp2017] 列队
 [CQOI2009] 中位数
 [洛谷P1419] 寻找段落
 [HNOI2001] 产品加工
 [洛谷P1842] 奶牛玩杂技
 [SCOI2006] 数字立方体
 [LOJ10121] 与众不同
 [USACO10MAR] 伟大的奶牛聚集
 [HAOI2010] 软件安装
 [洛谷P1357] 花园