如果利用opencv里面提供的stitching detail的话。
输入参数:
stitching_detail --save_graph a.dot 1.png 2.png
其中a.dot 文件中的内容如下:
graph matches_graph{
"1.png" -- "2.png"[label="Nm=26, Ni=19, C=1.20253"];
}
// 什么是 dot: https://zh.wikipedia.org/wiki/DOT语言
//如何配置 stitching detail 在这个作者的随笔里面有提到。可以自行查看
那么这这个 输出的结果什么意思呢? 在stackoverflow 上面有这样一个提问:
http://stackoverflow.com/questions/26364594/image-stitching-details-with-opencv
以下原文带翻译。
Image Stitching details with OpenCV
opencv图像拼接细节
I am trying to get deep into stitching. I am using cv::detail.
我在尝试更加深入的理解stitching。我在使用cv::detail
I am trying to follow this example:
https://github.com/Itseez/opencv/blob/master/samples/cpp/stitching_detailed.cpp
我运行了以下例子
I roughly understand the stitching pipeline.
我基本上理解了拼接管道
there is a function matchesGraphAsString() which return a graph. I am wondering how does it even compute this graph. Further, what is the dfination of confidence interval in this case.
这里面有一个matchesGraphAsString() 函数可以返回一张图片。我想知道它是如何得到这个图的。更进一步的,如何定义图片之间的置信度。
The output is in DOT format and a sample graph looks like
这个图大概长成这样。
graph matches_graph{"15.jpg" -- "13.jpg"[label="Nm=75, Ni=50, C=1.63934"];"15.jpg" -- "12.jpg"[label="Nm=47, Ni=28, C=1.26697"];"15.jpg" -- "14.jpg"[label="Nm=149, Ni=117, C=2.22011"];"11.jpg" -- "13.jpg"[label="Nm=71, Ni=52, C=1.77474"];"11.jpg" -- "9.jpg"[label="Nm=46, Ni=37, C=1.69725"];"11.jpg" -- "10.jpg"[label="Nm=87, Ni=73, C=2.14076"];"9.jpg" -- "8.jpg"[label="Nm=122, Ni=99, C=2.21973"];}
What does label, Nm, and Ni mean here? The official document seems to be lacking these details.
这里面的Nm和Ni 都是什么意思?官方的文档看起来缺少细节。
This is a very interesting question indeed. As @hatboyzero pointed out, the meaning of the variables is reasonably straightforward:
这是一个非常有趣的问题。就像hatboyzero指出的,这些变量的含义都有直接的出处。
- Nm is the number of matches (in the overlapping region, so obvious outliers have been removed already).
- Ni is the number of inliers after finding a homography with Ransac.
- C is the confidence that the two images are a match.
其中Nm是匹配的数量,(在重叠区域,明显的外围已经被移除);Ni是内围数量在一个满足ransac单应矩阵。C两幅图是一个匹配时的置信度。
Background to matching
匹配背景
Building a panorama is done by finding interest points in all images and computing descriptors for them. These descriptors, like SIFT, SURF and ORB, were developed so that the same parts of an image could be detected. They are just a medium-dimensional vector (64 or 128 dimensions are typical). By computing the L2 or some other distance between two descriptors, matches can be found. How many matches in a pair of images are found is described by the term Nm.
建造一个全景图是可以被完成的通过找到兴趣点在所有的图片里面并且计算述子为他们。这些述子包括sift,surf 和orb,都是为了探测一幅图片中的相同部分而被研发的。他们是一个中等维度的向量(以64维或者128维最为典型)。通过计算两个数字之间的L2【我猜是范数?】或者其他距离,匹配是可以被找到的。在一对图片中找到的匹配的多少将被描述为Nm。【trem术语】
Notice that so far, the matching has only been done through appearance of image regions around interest points. Very typically, many of these matches are plain wrong. This can be because the descriptor looks the same (think: repetitive object like window sills on a multi-window building, or leaves on a tree) or because the descriptor is just a bit too uninformative.【uninformative 无信息的】
注意到到目前为止,这个匹配仅仅被从这个图片区域中的显现出来的兴趣点所描述。通常的,这些匹配都是错误的。这是因为这些述子看起来很相像(想想:重复的物体比如多个窗户上的窗帘,或者树上的叶子)或者因为这些述子都不能很好的提供信息。
The common solution is to add geometric constraints: The image pair was taken from the same position with the same camera, therefore points that are close in one image must be close in the other image, too. More specifically, all the points must have undergone the same transformation. In the panorama case where the camera was rotated around the nodal point of the camera-lens system this transformation must have been a 2D homography.【geometric 几何的,specifically具体的nodal节点】
这个普遍的解决方案是增加一个几何约束:这个图片对来自相同相机的相同位置,因此这些点在一副图片中收敛在另一幅图片中也收敛。更具体的是,所有的这些点都要经历相同的变换。在全景情况下相机会围绕相机镜头系统的节点旋转,这个变换一定是一个2d的单应矩阵。
Ransac is the gold standard algorithm to find the best transformation and all the matches that are consistent with this tranformation. The number of these consistent matches is called Ni. Ransac works by randomly selecting in this case 4 matches (see paper sect 3.1) and fitting a homography to these four matches. Then, count how many matches from all possible matches would agree with this homography. Repeat 500 times (see paper) and at the end take the model that had the most inliers. Then re-compute the model with all inliers. The name of the algorithm comes from RANdom SAmple Consensus: RanSaC.
Ransac是一个黄金标准算法用来找到最好的转换并且所有的匹配在转换中都是一致的。这些一致的匹配被称作是Ni。Ransac的工作原理是随机选取4对Ni(见论文3.1章节)为这四对匹配适配一个单应矩阵。然后从所有可能的匹配中到到符合这个单应矩阵的匹配并进行计数。重复500次(见论文)并且最后采取这个拥有最多内围的模型。然后用所有的内围再次计算这个模型。这个算法的名字来源于随机 取样 一致:也即RANSAC。(RANdom SAmple Consensus)
Confidence-Term
术语——置信度
The question for me was, about this mysterious confidence. I quickly found where it was calculated.
这个问题对我来讲是这个神奇的置信度。我很快的找到了他是怎么计算得来的:
From stitching/sources/matches.cpp:
来自这个地方
// These coeffs are from paper M. Brown and D. Lowe. "Automatic Panoramic Image Stitching// using Invariant Features"
matches_info.confidence = matches_info.num_inliers / (8 + 0.3 * matches_info.matches.size());
// Set zero confidence to remove matches between too close images, as they don't provide// additional information anyway. The threshold was set experimentally.
matches_info.confidence = matches_info.confidence > 3. ? 0. : matches_info.confidence;
coeffs(非零系数,多项式系数)
这些系数来自M.Brown和D.Lowe 的自动全景图拼接使用不变特征
匹配信息的置信度 = 匹配信息的内围数/(8+0.3*匹配信息的匹配的大小);
为两个不匹配的图像设置置信度为0来移除匹配,既然他们不提供额外的信息,这个阈值通过经验设定。
匹配信息的置信度 = 匹配信息的置信度>3.?0.:匹配信息的置信度;
The mentioned paper has in section 3.2 ("Probabilistic Model for Image Match Verification") some more details to what this means.
提到的文章在3.2 部分(对于图片匹配认证的可能模型)更多的细节。
Reading this section a few things stood out.
有些事儿不得不说:
- There are a lot of variables (mostly probabilities) in their model. These values are defined in the paper without any justification. Below is the key sentence:
Though in practice we have chosen values for p0, p1, p(m = 0), p(m = 1) and pmin, they could in principle be learnt from the data.
So, this is just a theoretical exercise as the the parameters have been plucked out of thin air. Notice the could in principle be learnt.
在这个模型中有很多变量(大都是概率论的东西),这些值在论文中被定义没有任何认证。下面是关键的句子:
通过实验我们选择p0,p1,p(m =0),p(m=1) 和pmin,他们可能从数据的角度上符合一定的规则。
2.
The paper has in equation 13 the confidence calculation. If read correctly, it means that matches_info.confidence indicates a proper match between two images iff its value is above 1.
这篇文章在它的第十三个引用中有置信度的计算。如果理解没有错误的话,它的意思是匹配信息的置信度表明如果两幅图片存在合适的匹配的话 这个数值将会大于1.
3.
I don't see any justification in the removal of a match (setting confidence to 0) when the confidence is above 3. It just means that there are very little outliers. I think the programmers thought that a high number of matches that turn out to be outlier means that the images overlap a great deal, but this isn't provided by algorithms behind this. (Simply, the matchings are based on appearance of features.)
我没有看到有任何正当理由来移除一个匹配当它的置信度大于3的时候。它仅仅意味着两幅图片有极少的外点。我认为程序员认为很多数量的外围【这里应该是inliers吧 我猜。】的匹配意味着图片有很多重叠,但是并没有提供算法。(很简单,匹配基于已经出现的特征。)