zoukankan      html  css  js  c++  java
  • [机器视觉] 实际场景字提取

    手机场景下字体提取

    简介

    这是老师突然给的任务,不做吧又不好,做呗。实际做的过程中让我对形态学处理有了新的认识,我真没想到形态学处理这么强大,同时也深化了我对sobel算子的理解和记忆。。强大!
    处理过程供分为两步,

    1. ROI的获取并矫正
    2. 字提取

    ROI提取

    思路大致是这样的,由于在原图中存在矩形表格框,所以只要能够拿到矩形表格框下的ROI,然后在ROI下提取文字,那么处理起来应该会方便很多,面临的问题主要有特定方向边界的提取、矩形拟合、关键点如何变换的问题。

    对于特定方向边界的提取,我采用的是sobel算子在高斯平滑和中值滤波之后进行提取,单独提取x方向和y方向边界,然后按照1:1权重加和,期间kernel size需要调参。二值化,经过形态学膨胀和腐蚀处理将噪声点去掉,然后找contour。

    对于举行拟合,计算每个contour的面积,面积满足一定阈值留下,留下的contour使用多边形拟合得到多边形边界,得到的候选多边形边界类似于举行,但仍然有干扰点,通过简单的算法得到举行的四个定点,于是得到拟合后的矩形。

    对于关键点变换,由于得到的举行可能是经过翻折、旋转等线性变换的样子,我们需要将举行“摆正”,通过构造变换矩阵,进行线性变换,将举行映射到一张正视图中。

    对每个contour执行上述操作,即得到ROI。需要注意的是,应对不同信息需要调参。

    # w:240*6 h:160*6
    def getROI(frame):
        while True:
            out_imgs = []
            src = copy.copy(frame)
            thre = cv.getTrackbarPos("thre","Trackbar")
            max_e = cv.getTrackbarPos("max_e","Trackbar")
            min_e = cv.getTrackbarPos("min_e","Trackbar")
            gray = cv.cvtColor(src,cv.COLOR_BGR2GRAY)
            gaussian = cv.GaussianBlur(gray,(3,3),0,0,cv.BORDER_DEFAULT)
            median = cv.medianBlur(gaussian,5)
            x = cv.Sobel(median,cv.CV_8U, 1, 0, ksize = 3)
            y = cv.Sobel(median,cv.CV_8U, 0, 1, ksize = 3)
            absX = cv.convertScaleAbs(x)
            absY = cv.convertScaleAbs(y)
            sobel = cv.addWeighted(absX,0.5,absY,0.5,0)
            r,binary = cv.threshold(sobel,thre,255,cv.THRESH_BINARY)
            s = gray.shape
            element1 = cv.getStructuringElement(cv.MORPH_RECT,(1*2+1,2*2+1))
            element2 = cv.getStructuringElement(cv.MORPH_RECT, (min_e*2+1,max_e*2+1))
            dilate = cv.dilate(binary,element1,iterations =1)
            erode = cv.erode(dilate,element2,iterations = 1)
            dilate = cv.dilate(erode,element1,iterations =2)
            binary = dilate
            img,contours,_ = cv.findContours(binary,cv.RETR_EXTERNAL,cv.CHAIN_APPROX_SIMPLE)
            for contour in contours:
                area = cv.contourArea(contour)
                if area > 50000:
                    appCurve = cv.approxPolyDP(contour,10,True)
                    hulls = cv.convexHull(appCurve)
                    i = 0
                    min_x_y = 9999999
                    max_x_y = 0
                    rect_point = [None,(0,0),None,(0,0)]
                    for hull in hulls:
                        point = (hull[0][0],hull[0][1])
                        x_y = point[0] + point[1]
                        if x_y > max_x_y:
                            max_x_y = x_y
                            rect_point[2] = point
                        if x_y < min_x_y:
                            min_x_y = x_y
                            rect_point[0] = point
                    p1 = (rect_point[2][0],rect_point[0][1])
                    p2 = (rect_point[0][0],rect_point[2][1])
                    for hull in hulls:
                        point = (hull[0][0],hull[0][1])
                        distance11 = abs(p1[0]-point[0]) + abs(p1[1]-point[1])
                        distance12 = abs(p1[0]-rect_point[1][0]) + abs(p1[1]-rect_point[1][1])
                        if distance11 < distance12:
                            rect_point[1] = point
                        distance21 = abs(p2[0]-point[0]) + abs(p2[1]-point[1])
                        distance22 = abs(p2[0]-rect_point[3][0]) + abs(p2[1]-rect_point[3][1])
                        if distance21 < distance22:
                            rect_point[3] = point
                    M = cv.getPerspectiveTransform(np.array(rect_point,dtype=np.float32),np.array([[0,0],[1440,0],[1440,960],[0,960]],dtype=np.float32))
                    out = cv.warpPerspective(src,M,(1440,960))
                    for p in rect_point:
                        cv.circle(src,p,20,(0,0,255),2)
                    #cv.imshow("out",out)
                    out_imgs.append(out)
            binary = cv.resize(binary,(int(s[1]/3),int(s[0]/3)),cv.INTER_LINEAR)
            
            cv.imshow("binary",binary)
            src = cv.resize(src,(int(s[1]/3),int(s[0]/3)),cv.INTER_LINEAR)
            cv.imshow("frame",src)
            key = cv.waitKey(0)
            if key ==27:
                for i in range(len(out_imgs)):
                    cv.imwrite("image/"+str(i)+".jpg",out_imgs[i])
                break
            cv.destroyAllWindows()
    

    字提取

    字提取的关键是找到bbox,思路是通过Canny算子得到轮廓特征,形态学膨胀去除噪声,找contour,对contour进行面积筛选,满足阈值拟合出外接矩形,对外接举行的高度进行阈值判断,除去噪声点拟合的小矩形。由此字区域提取完毕。

    def process(ROI):
        while True:
            thre1 =cv.getTrackbarPos("thre1","Trackbar")
            thre2 =cv.getTrackbarPos("thre2","Trackbar")
            max_e = cv.getTrackbarPos("max_e","Trackbar")
            min_e = cv.getTrackbarPos("min_e","Trackbar")
            height = cv.getTrackbarPos("height","Trackbar")
            roi = copy.copy(ROI)
            gray = cv.cvtColor(roi,cv.COLOR_BGR2GRAY)
            gaussian = cv.GaussianBlur(gray,(3,3),0,0,cv.BORDER_DEFAULT)
            median = cv.medianBlur(gaussian,3)
            edges = cv.Canny(median,thre1,thre2)
            element = cv.getStructuringElement(cv.MORPH_RECT,(min_e*2+1,max_e*2+1))
            dilate = cv.dilate(edges,element,iterations = 1)
            img,contours,_ = cv.findContours(dilate,cv.RETR_LIST,cv.CHAIN_APPROX_SIMPLE)
            #cv.drawContours(roi,contours,-1,(0,255,255),2)
            
            for contour in contours:
                area = cv.contourArea(contour)
                if area > 4:
                    rect = cv.boundingRect(contour)
                    if rect[3]>height:
                        cv.rectangle(roi,(rect[0],rect[1]),(rect[0]+rect[2],rect[1]+rect[3]),(0,255,255),2)
            cv.imshow("roi",roi)
            cv.imshow("dilate",dilate)
            cv.imshow("edges",edges)
            key = cv.waitKey(10)
            if key == 27:
                break
        cv.destroyAllWindows()
    

    效果

    原图

    bbox提取:

    还有图老师说隐私不让发,就两张凑个数。

    存在问题

    1. 调参严重(不同光线等条件)
    2. 提取灰度字只是边界提取,难于辨认
    3. 多尺度图片ROI提取需要调参(可以归结到1)
  • 相关阅读:
    (数字类型,进制转换,字符串,列表)介绍,操作,内置方法
    谁说JavaScript容易?
    sass/scss 和 less的区别
    为什么commonjs不适合于浏览器端
    nodejs与v8引擎
    单例模式和angular的services的使用方法
    深入浅出 妙用Javascript中apply、call、bind
    单页Web应用优缺点
    JavaScript面向对象
    使用iframe的优缺点,为什么少用iframe以及iframe和frame的区别。
  • 原文地址:https://www.cnblogs.com/aoru45/p/10305946.html
Copyright © 2011-2022 走看看