zoukankan      html  css  js  c++  java
  • Windows平台python验证码识别

    参考:

    http://oatest.dragonbravo.com/Authenticate/SignIn?returnUrl=%2f

    http://drops.wooyun.org/tips/6313

    http://blog.csdn.net/nwpulei/article/details/8457738

    http://www.pythonclub.org/project/captcha/python-pil

    http://blog.csdn.net/csapr1987/article/details/7728315  创建二维码图片

    python验证码识别库安装

    1.安装图像处理库PIL,即Python Image Library。

    下载地址:http://www.pythonware.com/products/pil/

    2. 安装google OCR识别引擎pytesseract

    以管理员身份运行命令提示行。

    cd C:Python27Scripts

    pip install pytesseract

    单色无干扰验证码识别

    对于完全单色没有任何干挠的验证码,识别起来比较容易。代码如下:

    import os
    import pytesseract
    import Image
    
    os.chdir('C:UsersAdministratorDownloadspicture')
    image = Image.open('verifycode.jpg')
    vcode = pytesseract.image_to_string(image)
    print vcode

     彩色有干扰验证码识别

    1. 中值过滤去噪。此种类型验证码包含了噪点,所以第一步就是去噪。
    2. 对图像亮度进行加强处理。中值过滤时,不少噪点淡化了,但是如果直接转换为单色,这些噪点又被强化显示了,因此增加这一步对图像亮度进行加强处理。
    3. 转换为单色。即通过二值化,将低于阈值的设置为0,高于阈值的设置为1,从而实现将图片变为黑白色。黑色像素输出1,白色像素输出0。

    代码如下:

    os.chdir('C:UsersAdministratorDownloadspicture')
    image = Image.open('vcode.gif')
    images = image.filter(ImageFilter.MedianFilter())
    enhancer = ImageEnhance.Contrast(images)
    images = enhancer.enhance(2)
    images = images.convert('1')
    images.show()

     验证码图像处理脑图如下:

    附上pytesseract简介:

    Metadata-Version: 1.1
    Name: pytesseract
    Version: 0.1.6
    Summary: Python-tesseract is a python wrapper for google's Tesseract-OCR
    Home-page: https://github.com/madmaze/python-tesseract
    Author: Matthias Lee
    Author-email: pytesseract@madmaze.net
    License: GPLv3
    Description: Python-tesseract is an optical character recognition (OCR) tool for python.
            That is, it will recognize and "read" the text embedded in images.
           
            Python-tesseract is a wrapper for google's Tesseract-OCR
            ( http://code.google.com/p/tesseract-ocr/ ).  It is also useful as a
            stand-alone invocation script to tesseract, as it can read all image types
            supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff,
            and others, whereas tesseract-ocr by default only supports tiff and bmp.
            Additionally, if used as a script, Python-tesseract will print the recognized
            text in stead of writing it to a file. Support for confidence estimates and
            bounding box data is planned for future releases.
           
           
            USAGE:
            ```
             > try:
             >     import Image
             > except ImportError:
             >     from PIL import Image
             > import pytesseract
             > print(pytesseract.image_to_string(Image.open('test.png')))
             > print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
            ```
           
            INSTALLATION:
           
            Prerequisites:
            * Python-tesseract requires python 2.5 or later or python 3.
            * You will need the Python Imaging Library (PIL).  Under Debian/Ubuntu, this is
              the package "python-imaging" or "python3-imaging" for python3.
            * Install google tesseract-ocr from http://code.google.com/p/tesseract-ocr/ .
              You must be able to invoke the tesseract command as "tesseract". If this
              isn't the case, for example because tesseract isn't in your PATH, you will
              have to change the "tesseract_cmd" variable at the top of 'tesseract.py'.
              Under Debian/Ubuntu you can use the package "tesseract-ocr".
             
            Installing via pip:  
            See the [pytesseract package page](https://pypi.python.org/pypi/pytesseract)  
            ```
            $> sudo pip install pytesseract  
            ```
           
            Installing from source:  
            ```
            $> git clone git@github.com:madmaze/pytesseract.git  
            $> sudo python setup.py install 
            ```
           
            LICENSE:
            Python-tesseract is released under the GPL v3.
           
            CONTRIBUTERS:
            - Originally written by [Samuel Hoffstaetter](https://github.com/hoffstaetter)
            - [Juarez Bochi](https://github.com/jbochi)
            - [Matthias Lee](https://github.com/madmaze)
            - [Lars Kistner](https://github.com/Sr4l)
    Keywords: python-tesseract OCR Python
    Platform: UNKNOWN
    Classifier: Programming Language :: Python
    Classifier: Programming Language :: Python :: 2
    Classifier: Programming Language :: Python :: 3

    我所喜欢的生活,在混沌中顿悟,和喜欢的一切在一起。
  • 相关阅读:
    CSharp程序员学Android开发---1.初识AndriodIDE,掌握工具使用
    生产者-消费者问题(2)
    c++顺序容器
    打印二叉树某一层次的值(重点)
    二叉树层次遍历
    搜索算法比较
    动态定义数组
    RMQ(range minimum/maximum query)即查询区间最大最小值。
    string 空值
    vector 下标操作
  • 原文地址:https://www.cnblogs.com/sophia194910/p/4936144.html
Copyright © 2011-2022 走看看