zoukankan html css js c++ java

python实现图文验证码识别

本文内容皆为作者原创，码字不易，如需转载，请注明出处：https://www.cnblogs.com/temari/p/13563429.html

一，验证码类别

以下为网站常见的验证码：

1.图片验证码：常见的为英文、数字、汉字，计算题等类型的验证码。

2.行为式验证码: 常见的有滑动拼图，文字点选，图标点选，推理拼图等类型的验证码。

3.手机短信验证码

4.语音验证码

二，web自动化验证码处理方法

我们在测试工作中，在测试登录时经常会遇到需要注册的场景，可以采取以下方式解决：

1.去掉验证码

这种方式最简单，让开发注释掉验证码代码即可。这种方法适用于在测试环境，但是在生产环境风险较大。

2.设置万能的验证码

为了系统安全，我们可以采取不取消验证码，而是在程序中留一个“后门”，设置一个“万能验证码”。只要输入万能验证码，程序就通过。

3.通过cookie绕过登录

通过向浏览器中添加cookie可以绕过登录的验证码，如记住密码功能，默认登录，自然就绕过了验证码。

4.验证码识别技术

现在市面上有很多第三方验证码识别平台，使用他们提供的识别工具，可以识别任何常规的验证码，但是大多验证码识别技术，识别率都很难达到100%。百度搜素验证码识别，可以看到多家验证码识别平台，如图：

三，图形验证码识别实现案例

3.1 web自动化测试环境搭建

1.Python+PyCharm环境搭建

2.chromedriver安装

3.Selenium安装

4.pillow模块安装（处理图像的库）

3.2 验证码平台

1.注册验证码平台：本案例使用超级鹰网站。

2.下载python示例代码

代码保存到本地，如图：

3.购买题分

验证码识别平台，不同的验证码类型有相应的价格，调用一次扣一次分值。使用前，要先给账户题分充值。我第一次运行代码就是没有注意这块，导致调用几次识别都为空，排查了很久才发现这个问题。超级鹰网站，初次绑定微信,可获赠1000题分，这点比较友好。对于练习足够用了。如图：

3.3 验证码识别思路

1.网站登录页面截屏保存,命名为a.png

2.使用Xpath定位验证码图片元素

3.获取验证码图片元素在屏幕中的坐标

4.根据验证码的坐标将验证码图片从登录截屏图片a.png中截取出来保存，命名为b.png

5.使用第三方验证码识别技术，读取验证码图片内容

3.4 涉及函数

1.截图函数

 def get_screenshot_as_file(self, filename):
      """
      Saves a screenshot of the current window to a PNG image file. Returns
      False if there is any IOError, else returns True. Use full paths in
      your filename.

      :Args:
       - filename: The full path you wish to save your screenshot to. This
       should end with a `.png` extension.
         
      :Usage:
      driver.get_screenshot_as_file('/Screenshots/foo.png')
      """

2.获取图片坐标函数

（1）函数定义

 def rect(self):
     """A dictionary with the size and location of the element."""
     if self._w3c:
           return self._execute(Command.GET_ELEMENT_RECT)['value']
      else:
           rect = self.size.copy()
           rect.update(self.location)
           return rect

（2）函数作用

rect属性返回一个矩形对象的宽度、高度，及左上角的横坐标、纵坐标。以超级鹰登录页面为例，屏幕左上角为原点，X轴和Y轴方式如图所示，获取验证码的rect属性，返回验证码图片高度，宽度和左上角的坐标。

3.截取图片函数

（1）函数定义

 def crop(self, box=None):
      """
      Returns a rectangular region from this image. The box is a
      4-tuple defining the left, upper, right, and lower pixel
      coordinate. See :ref:`coordinate-system`.

      Note: Prior to Pillow 3.4.0, this was a lazy operation.

      :param box: The crop rectangle, as a (left, upper, right, lower)-tuple.
      :rtype: :py:class:`~PIL.Image.Image`
      :returns: An :py:class:`~PIL.Image.Image` object.
      """

（2）函数作用

截取一个矩形区域，参数为截图对象在整个对象中的左、上、右和下的坐标。以超级鹰网站验证码示例，验证码的(left, upper, right, lower)=(x,y,x+width,y+height)。我理解的这个函数的截取操作，是截取四条直线的交叉部分，四条直线分别为x1=x,x2=x+width,y1=y,y2=y+height，如图蓝色区域：

4.第三方验证码识别函数

把第三方下载的函数，案例里是chaojiying.py，放到与验证码代码VerificationCode.py同层目录下，如图：

5.价格体系

超级鹰可以识别英文、数字、汉字、坐标、选择等任何类型的验证码。不同类型的验证码题分不一样，调用识别函数时要传验证码类型。调用前明确验证码类型代码。

3.5 完整代码

VerificationCode.py

from selenium import webdriver
from PIL import Image
from chaojiying import Chaojiying_Client
from time import sleep

""" 
代码功能：验证码识别
作者：柠檬草不孤单
Date:2020/08/06 20:25
验证码识别思路：登录页面截屏，使用Xpath定位验证码图片元素，使用rect属性获取验证码图片元素的长，宽及在屏幕中的坐标，计算出验证码图片上下左右四个方位的坐标，
从登录截屏图片中将验证码图片截取出来另存，然后使用第三方验证码识别技术，操作验证码图片，读取验证码内容。
注意事项：
1.计算坐标时，要注意屏幕的缩放设置，如屏幕按照150%缩放，计算坐标要乘以1.5
2.使用第三方验证码平台，账户题分需要充值，确保余额满足验证码官方单价题分
"""
driver=webdriver.Chrome()
#隐式等待
driver.implicitly_wait(10)
driver.get("http://www.chaojiying.com/user/login/")
#窗口最大化
driver.maximize_window()
#login=driver.find_element_by_xpath("/html/body/div[2]/div/ul/li[7]/a").click()
driver.find_element_by_xpath("//div[@class='login_form']/form/p[1]/input").clear()
#输入用户名
username=driver.find_element_by_xpath("//div[@class='login_form']/form/p[1]/input").send_keys("xxxx")
#输入密码
password=driver.find_element_by_xpath("//div[@class='login_form']/form/p[2]/input").send_keys("xxxx")

#截屏
driver.get_screenshot_as_file("register.png")
VerficationCode=driver.find_element_by_xpath("//div[@class='login_form']/form/div[1]/img")
#返回一个坐标字典，获取验证码长和宽，验证码左上角的坐标位置
location=VerficationCode.rect
print(location)
#计算验证码图片四个方位的坐标
left=location['x']*1.5
top=location['y']*1.5
bottom=(location['y']+location['height'])*1.5
right=(location['x']+location['width'])*1.5
#验证码的位置：左，上，右，下
CodeLocation=(left,top,right,bottom)
print(CodeLocation)
#打开图片
Page=Image.open("register.png")
#根据坐标位置切割出验证码图片,并保存
image=Page.crop(CodeLocation)
image.save("code.png")
#调用验证码识别函数
#用户中心>>软件ID 生成一个替换
chaojiying = Chaojiying_Client('xxxx', 'xxxx', '907004')
#本地图片文件路径
im = open("C:/Users/xiaoy/PycharmProjects/VerificationCode/code.png", 'rb').read()
#点击官方网站>>价格体系，查看验证码类型
result=chaojiying.PostPic(im,1004)
print(result)
print (result.get('pic_str'))
sleep(5)
driver.find_element_by_xpath("//div[@class='login_form']/form/p[3]/input").send_keys(result.get('pic_str'))
driver.find_element_by_xpath("//div[@class='login_form']/form/p[4]/input").click()

chaojiying.py

#!/usr/bin/env python
# coding:utf-8

import requests
from hashlib import md5

class Chaojiying_Client(object):

    def __init__(self, username, password, soft_id):
        self.username = username
        password =  password.encode('utf8')
        self.password = md5(password).hexdigest()
        self.soft_id = soft_id
        self.base_params = {
            'user': self.username,
            'pass2': self.password,
            'softid': self.soft_id,
        }
        self.headers = {
            'Connection': 'Keep-Alive',
            'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
        }

    def PostPic(self, im, codetype):
        """
        im: 图片字节
        codetype: 题目类型 参考 http://www.chaojiying.com/price.html
        """
        params = {
            'codetype': codetype,
        }
        params.update(self.base_params)
        files = {'userfile': ('ccc.jpg', im)}
        r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files, headers=self.headers)
        return r.json()

    def ReportError(self, im_id):
        """
        im_id:报错题目的图片ID
        """
        params = {
            'id': im_id,
        }
        params.update(self.base_params)
        r = requests.post('http://upload.chaojiying.net/Upload/ReportError.php', data=params, headers=self.headers)
        return r.json()

if __name__ == '__main__':
    chaojiying = Chaojiying_Client('超级鹰用户名', '超级鹰用户名的密码', '96001')    #用户中心>>软件ID 生成一个替换 96001
    im = open('a.jpg', 'rb').read()                                                    #本地图片文件路径 来替换 a.jpg 有时WIN系统须要//
    print (chaojiying.PostPic(im, 1902))                                            #1902 验证码类型  官方网站>>价格体系 3.4+版 print 后要加()

3.6 注意事项

1.计算验证码图片坐标时，注意屏幕缩放设置，如屏幕按照150%缩放，计算坐标要乘以1.5。如图：

查看全文

相关阅读:
Python 从入门到进阶之路（一）
Egg 企业级应用开发框架的搭建
 koa2 从入门到进阶之路（七）
koa2 从入门到进阶之路（六）
koa2 从入门到进阶之路（五）
衣服洗一个月之后失踪，这个互联网洗衣平台把衣服洗出了翅膀
 CSS3 Gradient 渐变还能这么玩
 MessageChannel 消息通道
 前端面试(算法篇)
JavaScript 中的相等操作符 ( 详解 [] == []、[] == ![]、{} == !{} )

原文地址：https://www.cnblogs.com/temari/p/13563429.html