zoukankan      html  css  js  c++  java
  • 爬虫中如何获取图片验证码

    1.采用网站截图方式

    import requests
    import time
    
    from selenium import webdriver
    from PIL import Image
    
    
    def part_screenshot(driver):
        driver.save_screenshot("hello1.png")
        return Image.open("hello1.png")
    
    
    def get_image(driver):  # 对验证码所在位置进行定位,然后截取验证码图片
        img = driver.find_element_by_xpath('//*[@id="u1"]/a[2]')
        time.sleep(2)
        location = img.location
        print(location, 111)
        size = img.size
        left = location['x']
        top = location['y']
        right = left + size['width']
        bottom = top + size['height']
        page_snap_obj = part_screenshot(driver)
        image_obj = page_snap_obj.crop((left, top, right, bottom))
        return image_obj  # 得到的就是验证码
    
    
    if __name__ == '__main__':
        driver = webdriver.Chrome()
        driver.get("https://www.baidu.com")
        print(driver.title)  # 打印页面的标题
        b = get_image(driver)
        b.save("1.png")
        print(b)
        driver.quit()  # 一定要退出!不退出会有残留进程!

    2.采用cookie获取图片验证码

    原理:浏览器请求图片验证码时,有些网页将图片的验证码编号信息存储到cookie中,用户提交登陆时,只提交图片验证码的值即可。后端通过cookie验证图片验证码编号是否正确。
    class
    Yun00Da1ma(): def __init__(self): self.session=requests.session() self.headers={ "Referer":"http://www.×××.com/", "User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" } def get_picture(self): url="http://www.×××.com/index/captcha" resp = self.session.get(url, headers=self.headers)#请求图片时,服务器会返回图片,并添加图片信息到session with open("dama.png","wb") as f: f.write(resp.content) img_result=check_img("dama.png") url1="http://www.×××.com/index/login?username=×××&password=×××&utype=ajh&vcode="+img_result resp2=self.session.get(url1,headers=self.headers)#进行模拟登陆 # print(resp2.content.decode("gbk","ignore"),11) def get_user(self):#采用session保持,直接登陆用户界面,验证是否登陆成功 url="http://www.×××.com/user" resp2 = self.session.get(url, headers=self.headers) if __name__ == '__main__': yun00da1ma=Yun00Da1ma() yun00da1ma.get_picture() yun00da1ma.get_user()
  • 相关阅读:
    >>> fout = open('output.txt', 'w') Traceback (most recent call last): File "<stdin>", line 1, in <module> PermissionError: [Errno 13] Permission denied: 'output.txt'
    Python元组术语
    Python元组与列表_元组与字典
    Python元组_参数长度可变
    Python元组_赋值与返回值
    Python元组_不可修改
    第二篇-bmob云端服务器的发现
    第一篇-关于语言与计划
    《JavaScript》JS中的常用方法attr(),splice()
    Java接口interface,匿名内部类
  • 原文地址:https://www.cnblogs.com/xuehaiwuya0000/p/11509435.html
Copyright © 2011-2022 走看看