zoukankan      html  css  js  c++  java
  • 图像处理8 选择性搜索(selective search)

    介绍

    以下介绍和算法引用自选择性搜索(selective search)

    一、目标检测 VS 目标识别

    目标识别(objec recognition)是指明一幅输入图像中包含那类目标。其输入为一幅图像,输出是该图像中的目标属于哪个类别(class probability)。而目标检测(object detection)除了要告诉输入图像中包含了哪类目前外,还要框出该目标的具体位置(bounding boxes)。

    在目标检测时,为了定位到目标的具体位置,通常会把图像分成许多子块(sub-regions / patches),然后把子块作为输入,送到目标识别的模型中。分子块的最直接方法叫滑动窗口法(sliding window approach)。滑动窗口的方法就是按照子块的大小在整幅图像上穷举所有子图像块。这种方法产生的数据量想想都头大。和滑动窗口法相对的是另外一类基于区域(region proposal)的方法。selective search就是其中之一!

    二、selective search算法流程

    step0:生成区域集R,具体参见论文《Efficient Graph-Based Image Segmentation》

    step1:计算区域集R里每个相邻区域的相似度S={s1,s2,…} 
    step2:找出相似度最高的两个区域,将其合并为新集,添加进R 
    step3:从S中移除所有与step2中有关的子集 
    step4:计算新集与所有子集的相似度 
    step5:跳至step2,直至S为空

    三、相似度计算

    论文考虑了颜色、纹理、尺寸和空间交叠这4个参数。

    3.1、颜色相似度(color similarity)
    将色彩空间转为HSV,每个通道下以bins=25计算直方图,这样每个区域的颜色直方图有25*3=75个区间。 对直方图除以区域尺寸做归一化后使用下式计算相似度:

    3.2、纹理相似度(texture similarity)

    论文采用方差为1的高斯分布在8个方向做梯度统计,然后将统计结果(尺寸与区域大小一致)以bins=10计算直方图。直方图区间数为8*3*10=240(使用RGB色彩空间)。

    其中,是直方图中第个bin的值。

    3.3、尺寸相似度(size similarity)

    保证合并操作的尺度较为均匀,避免一个大区域陆续“吃掉”其他小区域。

    例:设有区域a-b-c-d-e-f-g-h。较好的合并方式是:ab-cd-ef-gh -> abcd-efgh -> abcdefgh。 不好的合并方法是:ab-c-d-e-f-g-h ->abcd-e-f-g-h ->abcdef-gh -> abcdefgh。

    3.4、交叠相似度(shape compatibility measure)

    3.5、最终的相似度

    实现

    使用了前面几篇提到的一些算法,运行速度慢

    # -*- coding: utf-8 -*-
    from __future__ import division
    from skimage.color import rgb2hsv
    from skimage.data import astronaut
    import numpy
    from _felzenszwalb_cy import _felzenszwalb_cython
    import matplotlib.patches as mpatches
    import matplotlib.pyplot as plt
    import math
    
    # "Selective Search for Object Recognition" by J.R.R. Uijlings et al.
    #
    #  - Modified version with LBP extractor for texture vectorization
    
    
    def _sim_colour(r1, r2):
        """
            calculate the sum of histogram intersection of colour
        """
        return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])
    
    
    def _sim_texture(r1, r2):
        """
            calculate the sum of histogram intersection of texture
        """
        return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])
    
    
    def _sim_size(r1, r2, imsize):
        """
            calculate the size similarity over the image
        """
        return 1.0 - (r1["size"] + r2["size"]) / imsize
    
    
    def _sim_fill(r1, r2, imsize):
        """
            calculate the fill similarity over the image
        """
        bbsize = (
            (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
            * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
        )
        return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize
    
    
    def _calc_sim(r1, r2, imsize):
        return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
                + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))
    
    
    def histogram(a, bins=10, range=None):
        """
        Compute the histogram of a set of data.
    
        """
        import numpy as numpy
        from numpy.core import linspace
        from numpy.core.numeric import (arange, asarray)
        # 转成一维数组
        a = asarray(a)
        a = a.ravel()
        mn, mx = [mi + 0.0 for mi in range]
        ntype = numpy.dtype(numpy.intp)
        n = numpy.zeros(bins, ntype)
        # 预计算直方图缩放因子
        norm = bins / (mx - mn)
    
        # 均分,计算边缘以进行潜在的校正
        bin_edges = linspace(mn, mx, bins + 1, endpoint=True)
    
        # 分块对于大数组可以降低运行内存,同时提高速度
        BLOCK = 65536
        for i in arange(0, len(a), BLOCK):
            tmp_a = a[i:i + BLOCK]
            tmp_a_data = tmp_a.astype(float)
            # 减去Range下限,乘以缩放因子,向下取整
            tmp_a = tmp_a_data - mn
            tmp_a *= norm
            indices = tmp_a.astype(numpy.intp)
            # 对indices标签分别计数,标签等于bins减一
            indices[indices == bins] -= 1
            n += numpy.bincount(indices, weights=None,
                                 minlength=bins).astype(ntype)
    
        return n, bin_edges
    
    
    def _calc_colour_hist(img):
        """
            calculate colour histogram for each region
    
            the size of output histogram will be BINS * COLOUR_CHANNELS(3)
    
            number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]
    
            extract HSV
        """
    
        BINS = 25
        hist = numpy.array([])
    
        for colour_channel in (0, 1, 2):
    
            # extracting one colour channel
            c = img[:, colour_channel]
    
            # calculate histogram for each colour and join to the result
            hist = numpy.concatenate(
            [hist] + [histogram(c, BINS, (0.0, 255.0))[0]])
    
        # L1 normalize
        hist = hist / len(img)
    
        return hist
    
    def circular_LBP(src, n_points, radius):
        height = src.shape[0]
        width = src.shape[1]
        dst = src.copy()
        src.astype(dtype=numpy.float32)
        dst.astype(dtype=numpy.float32)
    
        neighbours = numpy.zeros((1, n_points), dtype=numpy.uint8)
        lbp_value = numpy.zeros((1, n_points), dtype=numpy.uint8)
        for x in range(radius, width - radius - 1):
            for y in range(radius, height - radius - 1):
                lbp = 0.
                # 先计算共n_points个点对应的像素值,使用双线性插值法
                for n in range(n_points):
                    theta = float(2 * numpy.pi * n) / n_points
                    x_n = x + radius * numpy.cos(theta)
                    y_n = y - radius * numpy.sin(theta)
    
                    # 向下取整
                    x1 = int(math.floor(x_n))
                    y1 = int(math.floor(y_n))
                    # 向上取整
                    x2 = int(math.ceil(x_n))
                    y2 = int(math.ceil(y_n))
    
                    # 将坐标映射到0-1之间
                    tx = numpy.abs(x - x1)
                    ty = numpy.abs(y - y1)
    
                    # 根据0-1之间的x,y的权重计算公式计算权重
                    w1 = (1 - tx) * (1 - ty)
                    w2 = tx * (1 - ty)
                    w3 = (1 - tx) * ty
                    w4 = tx * ty
    
                    # 根据双线性插值公式计算第k个采样点的灰度值
                    neighbour = src[y1, x1] * w1 + src[y2, x1] * w2 + src[y1, x2] * w3 + src[y2, x2] * w4
    
                    neighbours[0, n] = neighbour
    
                center = src[y, x]
    
                for n in range(n_points):
                    if neighbours[0, n] > center:
                        lbp_value[0, n] = 1
                    else:
                        lbp_value[0, n] = 0
    
                for n in range(n_points):
                    lbp += lbp_value[0, n] * 2**n
    
                # 转换到0-255的灰度空间,比如n_points=16位时结果会超出这个范围,对该结果归一化
                dst[y, x] = int(lbp / (2**n_points-1) * 255)
    
        return dst
    
    def _calc_texture_gradient(img):
        """
            calculate texture gradient for entire image
    
            The original SelectiveSearch algorithm proposed Gaussian derivative
            for 8 orientations, but we use LBP instead.
    
            output will be [height(*)][width(*)]
        """
        ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))
    
        for colour_channel in (0, 1, 2):
            ret[:, :, colour_channel] = circular_LBP(
                img[:, :, colour_channel], 8, 1)
    
        return ret
    
    
    def _calc_texture_hist(img):
        """
            calculate texture histogram for each region
    
            calculate the histogram of gradient for each colours
            the size of output histogram will be
                BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
        """
        BINS = 10
    
        hist = numpy.array([])
    
        for colour_channel in (0, 1, 2):
    
            # mask by the colour channel
            fd = img[:, colour_channel]
    
            # calculate histogram for each orientation and concatenate them all
            # and join to the result
            hist = numpy.concatenate(
                [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])
    
        # L1 Normalize
        hist = hist / len(img)
    
        return hist
    
    
    def _merge_regions(r1, r2):
        new_size = r1["size"] + r2["size"]
        rt = {
            "min_x": min(r1["min_x"], r2["min_x"]),
            "min_y": min(r1["min_y"], r2["min_y"]),
            "max_x": max(r1["max_x"], r2["max_x"]),
            "max_y": max(r1["max_y"], r2["max_y"]),
            "size": new_size,
            "hist_c": (
                r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
            "hist_t": (
                r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
            "labels": r1["labels"] + r2["labels"]
        }
        return rt
    
    
    def selective_search(
            im_orig, scale=1.0, sigma=0.8, min_size=50):
    
        assert im_orig.shape[2] == 3, "3ch image is expected"
    
        # 加载图像,使用felzenszwalb算法获取候选区域,区域标签存储到数据的第四维
    
        im_mask = _felzenszwalb_cython(
            im_orig, scale=scale, sigma=sigma,
            min_size=min_size, kernel=9)
        img = numpy.append(
            im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
        img[:, :, 3] = im_mask
    
        if img is None:
            return None, {}
    
        imsize = img.shape[0] * img.shape[1]
    
        R = {}
        hsv = rgb2hsv(img[:, :, :3])
        # 第一步: 存储像素位置
        for y, i in enumerate(img):
            for x, (r, g, b, l) in enumerate(i):
                # 新区域
                if l not in R:
                    R[l] = {
                        "min_x": 0xffff, "min_y": 0xffff,
                        "max_x": 0, "max_y": 0, "labels": [l]}
    
                # bounding box
                if R[l]["min_x"] > x:
                    R[l]["min_x"] = x
                if R[l]["min_y"] > y:
                    R[l]["min_y"] = y
                if R[l]["max_x"] < x:
                    R[l]["max_x"] = x
                if R[l]["max_y"] < y:
                    R[l]["max_y"] = y
    
        # 第二步: 计算LBP纹理特征
        tex_grad = _calc_texture_gradient(img)
    
        # 第三步: 计算每个区域的颜色直方图
        for k, v in list(R.items()):
            masked_pixels = hsv[:, :, :][img[:, :, 3] == k]
            R[k]["size"] = len(masked_pixels)  # /4
            R[k]["hist_c"] = _calc_colour_hist(masked_pixels)
            R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])
    
        # 提取邻域信息
        def intersect(a, b):
            if (a["min_x"] < b["min_x"] < a["max_x"]
                    and a["min_y"] < b["min_y"] < a["max_y"]) or (
                a["min_x"] < b["max_x"] < a["max_x"]
                    and a["min_y"] < b["max_y"] < a["max_y"]) or (
                a["min_x"] < b["min_x"] < a["max_x"]
                    and a["min_y"] < b["max_y"] < a["max_y"]) or (
                a["min_x"] < b["max_x"] < a["max_x"]
                    and a["min_y"] < b["min_y"] < a["max_y"]):
                return True
            return False
    
        R_list = list(R.items())
        neighbours = []
        for cur, a in enumerate(R_list[:-1]):
            for b in R_list[cur + 1:]:
                if intersect(a[1], b[1]):
                    neighbours.append((a, b))
    
        # 计算初始相似度
        S = {}
        for (ai, ar), (bi, br) in neighbours:
            S[(ai, bi)] = _calc_sim(ar, br, imsize)
    
        # hierarchal search
        while S != {}:
    
            # get highest similarity
            i, j = sorted(S.items(), key=lambda i: i[1])[-1][0]
    
            # merge corresponding regions
            t = max(R.keys()) + 1.0
            R[t] = _merge_regions(R[i], R[j])
    
            # mark similarities for regions to be removed
            key_to_delete = []
            for k, v in list(S.items()):
                if (i in k) or (j in k):
                    key_to_delete.append(k)
    
            # remove old similarities of related regions
            for k in key_to_delete:
                del S[k]
    
            # calculate similarity set with the new region
            for k in [a for a in key_to_delete if a != (i, j)]:
                n = k[1] if k[0] in (i, j) else k[0]
                S[(t, n)] = _calc_sim(R[t], R[n], imsize)
    
        regions = []
        for k, r in list(R.items()):
            regions.append({
                'rect': (
                    r['min_x'], r['min_y'],
                    r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
                'size': r['size'],
                'labels': r['labels']
            })
    
        return img, regions
    
    
    if __name__=="__main__":
        img = astronaut()
    
        # perform selective search
        img_lbl, regions = selective_search(
            img, scale=500, sigma=0.9, min_size=10)
    
        candidates = set()
        for r in regions:
            # excluding same rectangle (with different segments)
            if r['rect'] in candidates:
                continue
            # excluding regions smaller than 2000 pixels
            if r['size'] < 2000:
                continue
            # distorted rects
            x, y, w, h = r['rect']
            if w / h > 1.2 or h / w > 1.2:
                continue
            candidates.add(r['rect'])
    
        # draw rectangles on the original image
        fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))
        ax.imshow(img)
        for x, y, w, h in candidates:
            print(x, y, w, h)
            rect = mpatches.Rectangle(
                (x, y), w, h, fill=False, edgecolor='red', linewidth=1)
            ax.add_patch(rect)
    
        plt.show()
    
    """
    (46, 0, 353, 326)
    (366, 223, 93, 81)
    (365, 345, 129, 109)
    (32, 210, 207, 213)
    (10, 0, 295, 284)
    (0, 0, 305, 284)
    (152, 17, 147, 171)
    (32, 210, 207, 228)
    (111, 150, 400, 361)
    (0, 210, 356, 301)
    (42, 210, 197, 213)
    (0, 0, 305, 333)
    (0, 0, 511, 511)
    (111, 150, 194, 198)
    (134, 348, 75, 75)
    (46, 0, 353, 351)
    (111, 150, 221, 248)
    (214, 216, 297, 295)
    (364, 345, 130, 109)
    """
    _felzenszwalb_cython代码如下。

    import numpy as np
    import cv2
    
    def find_root(forest, n):
        """Find the root of node n.
        Given the example above, for any integer from 1 to 9, 1 is always returned
        """
        root = n
        while (forest[root] < root):
            root = forest[root]
        return root
    
    
    def set_root(forest, n, root):
        """
        Set all nodes on a path to point to new_root.
        Given the example above, given n=9, root=6, it would "reconnect" the tree.
        so forest[9] = 6 and forest[8] = 6
        The ultimate goal is that all tree nodes point to the real root,
        which is element 1 in this case.
        """
        while (forest[n] < n):
            j = forest[n]
            forest[n] = root
            n = j
        forest[n] = root
    
    def join_trees(forest, n, m):
        """Join two trees containing nodes n and m.
        If we imagine that in the example tree, the root 1 is not known, we
        rather have two disjoint trees with roots 2 and 6.
        Joining them would mean that all elements of both trees become connected
        to the element 2, so forest[9] == 2, forest[6] == 2 etc.
        However, when the relationship between 1 and 2 can still be discovered later.
        """
    
        if (n != m):
            root = find_root(forest, n)
            root_m = find_root(forest, m)
    
            if (root > root_m):
                root = root_m
    
            set_root(forest, n, root)
            set_root(forest, m, root)
    
    def _felzenszwalb_cython(image, scale=1, sigma=0.8,kernel=3,min_size=20):
        """Felzenszwalb's efficient graph based segmentation for
        single or multiple channels.
    
        Produces an oversegmentation of a single or multi-channel image
        using a fast, minimum spanning tree based clustering on the image grid.
        The number of produced segments as well as their size can only be
        controlled indirectly through ``scale``. Segment size within an image can
        vary greatly depending on local contrast.
    
        Parameters
        ----------
        image : (N, M, C) ndarray
            Input image.
        scale : float, optional (default 1)
            Sets the obervation level. Higher means larger clusters.
        sigma : float, optional (default 0.8)
            Width of Gaussian smoothing kernel used in preprocessing.
            Larger sigma gives smother segment boundaries.
        min_size : int, optional (default 20)
            Minimum component size. Enforced using postprocessing.
    
        Returns
        -------
        segment_mask : (N, M) ndarray
            Integer mask indicating segment labels.
        """
    
        # image = img_as_float(image)
        image = np.asanyarray(image, dtype=np.float) / 255
    
        # rescale scale to behave like in reference implementation
        scale = float(scale) / 255.
        # image = ndi.gaussian_filter(image, sigma=[sigma, sigma, 0])
        image =cv2.GaussianBlur(image, (kernel, kernel), sigma)
    
        # compute edge weights in 8 connectivity:
        down_cost = np.sqrt(np.sum((image[1:, :, :] - image[:-1, :, :])
            *(image[1:, :, :] - image[:-1, :, :]), axis=-1))
        right_cost = np.sqrt(np.sum((image[:, 1:, :] - image[:, :-1, :])
            *(image[:, 1:, :] - image[:, :-1, :]), axis=-1))
        dright_cost = np.sqrt(np.sum((image[1:, 1:, :] - image[:-1, :-1, :])
        *(image[1:, 1:, :] - image[:-1, :-1, :]), axis=-1))
        uright_cost = np.sqrt(np.sum((image[1:, :-1, :] - image[:-1, 1:, :])
            *(image[1:, :-1, :] - image[:-1, 1:, :]), axis=-1))
        costs = np.hstack([
            right_cost.ravel(), down_cost.ravel(), dright_cost.ravel(),
            uright_cost.ravel()]).astype(np.float)
    
        # compute edges between pixels:
        height, width = image.shape[:2]
        segments = np.arange(width * height, dtype=np.intp).reshape(height, width)
        down_edges = np.c_[segments[1:, :].ravel(), segments[:-1, :].ravel()]
        right_edges = np.c_[segments[:, 1:].ravel(), segments[:, :-1].ravel()]
        dright_edges = np.c_[segments[1:, 1:].ravel(), segments[:-1, :-1].ravel()]
        uright_edges = np.c_[segments[:-1, 1:].ravel(), segments[1:, :-1].ravel()]
        edges = np.vstack([right_edges, down_edges, dright_edges, uright_edges])
    
        # initialize data structures for segment size
        # and inner cost, then start greedy iteration over edges.
        edge_queue = np.argsort(costs)
        edges = np.ascontiguousarray(edges[edge_queue])
        costs = np.ascontiguousarray(costs[edge_queue])
        segments_p = np.arange(width * height, dtype=np.intp) #segments
    
        segment_size = np.ones(width * height, dtype=np.intp)
    
        # inner cost of segments
        cint = np.zeros(width * height)
        num_costs = costs.size
    
        for e in range(num_costs):
            seg0 = find_root(segments_p, edges[e][0])
            seg1 = find_root(segments_p, edges[e][1])
            if seg0 == seg1:
                continue
            inner_cost0 = cint[seg0] + scale / segment_size[seg0]
            inner_cost1 = cint[seg1] + scale / segment_size[seg1]
            if costs[e] < min(inner_cost0, inner_cost1):
                # update size and cost
                join_trees(segments_p, seg0, seg1)
                seg_new = find_root(segments_p, seg0)
                segment_size[seg_new] = segment_size[seg0] + segment_size[seg1]
                cint[seg_new] = costs[e]
    
        # postprocessing to remove small segments
        # edges = edges
        for e in range(num_costs):
            seg0 = find_root(segments_p, edges[e][0])
            seg1 = find_root(segments_p, edges[e][1])
            if seg0 == seg1:
                continue
            if segment_size[seg0] < min_size or segment_size[seg1] < min_size:
                join_trees(segments_p, seg0, seg1)
                seg_new = find_root(segments_p, seg0)
                segment_size[seg_new] = segment_size[seg0] + segment_size[seg1]
    
        # unravel the union find tree
        flat = segments_p.ravel()
        old = np.zeros_like(flat)
        while (old != flat).any():
            old = flat
            flat = flat[flat]
        flat = np.unique(flat, return_inverse=True)[1]
        return flat.reshape((height, width))
  • 相关阅读:
    封装、继承、多态的意义等。
    php中类与对象的区别和关系是什么?
    MySQL数据库中InnoDB和MyISAM两种数据引擎的差别
    什么是索引? 索引的定义与用法等。
    什么是正则表达式?
    字符串—strcpy
    排序
    虚函数
    优先队列(堆)
    因子个数_错排公式
  • 原文地址:https://www.cnblogs.com/qw12/p/9572193.html
Copyright © 2011-2022 走看看