zoukankan      html  css  js  c++  java
  • 滑动验证码验证

    selenium +chrome+ firefox + webdriver 遇到的坑

    lunix中启动webdriver时报错一:

    测试代码为:

    1.  
      #!/usr/bin/python
    2.  
      # -*- coding: utf-8 -*-
    3.  
       
    4.  
       
    5.  
      from selenium import webdriver
    6.  
       
    7.  
      driver = webdriver.Firefox()
    8.  
      driver.get("https://www.baidu.com")

    运行报错信息如下:

    1.  
      Traceback (most recent call last):
    2.  
      File "maimai_web.py", line 14, in <module>
    3.  
      driver = webdriver.Firefox()
    4.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py", line 152, in __init__
    5.  
      keep_alive=True)
    6.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__
    7.  
      self.start_session(desired_capabilities, browser_profile)
    8.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session
    9.  
      response = self.execute(Command.NEW_SESSION, parameters)
    10.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute
    11.  
      self.error_handler.check_response(response)
    12.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
    13.  
      raise exception_class(message, screen, stacktrace)
    14.  
      selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1

    处理方法:

    1.  
      #!/usr/bin/python
    2.  
      # -*- coding: utf-8 -*-
    3.  
       
    4.  
       
    5.  
      from pyvirtualdisplay import Display
    6.  
      from selenium import webdriver
    7.  
       
    8.  
       
    9.  
      display = Display(visible=0, size=(1920, 1080))
    10.  
      display.start()
    11.  
      driver = webdriver.Firefox()
    12.  
      driver.get("https://www.baidu.com")

    结果:

    运行ok,搞定!

    坑二、webdriver实例化报错

    采用多线程调用webdriver时候,偶尔会出现这样的错:selenium.common.exceptions.WebDriverException: Message: connection refused

    1.  
      Exception in thread Thread-2:
    2.  
      Traceback (most recent call last):
    3.  
      File "/usr/local/python3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    4.  
      self.run()
    5.  
      File "/usr/local/python3.6/lib/python3.6/threading.py", line 864, in run
    6.  
      self._target(*self._args, **self._kwargs)
    7.  
      File "maimai_tran_account_driver.py", line 591, in debug
    8.  
      t = TrainAccount(count,lock)
    9.  
      File "maimai_tran_account_driver.py", line 32, in __init__
    10.  
      self.chrome = webdriver.Firefox()
    11.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py", line 152, in __init__
    12.  
      keep_alive=True)
    13.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__
    14.  
      self.start_session(desired_capabilities, browser_profile)
    15.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session
    16.  
      response = self.execute(Command.NEW_SESSION, parameters)
    17.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute
    18.  
      self.error_handler.check_response(response)
    19.  
      File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
    20.  
      raise exception_class(message, screen, stacktrace)
    21.  
      selenium.common.exceptions.WebDriverException: Message: connection refused

     查看geckodriver.log具体报错信息。

    坑三、模拟器被反爬

    原因是在webdriver发送请求的时候,会有webdriver的js判断,当检测到此字段时会被作为爬虫处理,应对策略如下。

    工具:mitmproxy做代理,替换掉请求里面的webdriver为别的字段

    部分代码如下:

    1.  
      if "/_next/static/js/common_pdd" in flow.request.url:
    2.  
      flow.response.text = flow.response.text.replace("webdriver", "userAgent")

    坑四、滑动验证码验证失败

    同样的代码,chromedriver验证码通过,firefox滑动到正常位置报失败,最后发现原因是firefox在滑动模块的时候速度太慢被机器识别出来,解决方法,增大滑动的速度,附上滑动验证的部分代码,如下:

    1.  
      def crack_geetest(self, max_retry=10):
    2.  
      driver = self.driver
    3.  
      l = self.logger
    4.  
      l.info("process handle geetest captcha...")
    5.  
       
    6.  
      def get_position():
    7.  
      """
    8.  
      获取验证码位置
    9.  
      :return: 验证码位置元组
    10.  
      """
    11.  
      img = driver.find_element_by_xpath('//div[@class="geetest_canvas_img geetest_absolute"]')
    12.  
      time.sleep(2)
    13.  
      location = img.location
    14.  
      size = img.size
    15.  
      top, bottom, left, right = location['y'], location['y'] + size['height'], location['x'], location['x'] +
    16.  
      size['width']
    17.  
      return (top, bottom, left, right)
    18.  
       
    19.  
      def get_geetest_image(name):
    20.  
      """
    21.  
      获取验证码图片
    22.  
      :return: 图片对象
    23.  
      """
    24.  
      full_img_path = './zhilian_screenshot_{}.png'.format(self.account['user_id'])
    25.  
      driver.save_screenshot(filename=full_img_path)
    26.  
      image = Image.open(fp=full_img_path, mode='r')
    27.  
      top, bottom, left, right = get_position()
    28.  
      print('验证码位置:({},{},{},{})'.format(left, top, right, bottom))
    29.  
      t = driver.execute_script('var q=document.documentElement.scrollTop; return q;')
    30.  
      print('验证码位置:({},{},{},{})'.format(left, top - int(t), right, bottom - int(t)))
    31.  
      print('p--->>>', t)
    32.  
      captcha = image.crop((left, top - int(t), right, bottom - int(t)))
    33.  
      captcha_file_name = './zhilian_captcha_{}_{}.png'.format(self.account['user_id'], name)
    34.  
      captcha.save(captcha_file_name)
    35.  
      return captcha, captcha_file_name
    36.  
       
    37.  
      def get_slider():
    38.  
      """
    39.  
      获取滑块
    40.  
      :return: 滑块对象
    41.  
      """
    42.  
      slider = driver.find_element_by_xpath('//div[@class="geetest_slider_button"]')
    43.  
      return slider
    44.  
       
    45.  
      def get_gap(captcha_file_name):
    46.  
      """
    47.  
      获取缺口偏移量
    48.  
      :param image1: 不带缺口图片
    49.  
      :param image2: 带缺口图片
    50.  
      :return:
    51.  
      """
    52.  
      res = self.dama2.decode_captcha(6137, captcha_file_name)
    53.  
      print(res)
    54.  
      # ('b800b4f6-0d9a-40e2-a972-d87c91582b46', [(176, 101)])
    55.  
      return int(res[1][0][0])
    56.  
       
    57.  
      def calculate_tracks(distance):
    58.  
      def generate_rand(n, sum_v): # 随机生成n个总和为sum_v的list
    59.  
      Vector = [random.randint(1, 3) for _ in range(n)]
    60.  
      Vector = [int(i / sum(Vector) * sum_v) for i in Vector]
    61.  
      if sum(Vector) < sum_v:
    62.  
      res = sum_v - sum(Vector)
    63.  
      for i in range(res):
    64.  
      Vector[random.randint(0, n - 1)] += 1
    65.  
      return [0 - i for i in Vector]
    66.  
       
    67.  
      back_dis = random.randint(16, 26)
    68.  
      distance += back_dis # 先滑过一点,最后再反着滑动回来
    69.  
      v = 0
    70.  
      t = 0.2
    71.  
      forward_tracks = []
    72.  
       
    73.  
      current = 0
    74.  
      mid = distance * 3 / 5
    75.  
      while current < distance:
    76.  
      if current < mid:
    77.  
      a = 2
    78.  
      else:
    79.  
      a = -3
    80.  
       
    81.  
      s = v * t + 0.5 * a * (t ** 2)
    82.  
      v = v + a * t
    83.  
      current += s
    84.  
      forward_tracks.append(round(s))
    85.  
       
    86.  
      # 反着滑动到准确位置
    87.  
      back_tracks = generate_rand(15, back_dis) # 总共等于 back_dis
    88.  
      return {'forward_tracks': forward_tracks, 'back_tracks': back_tracks}
    89.  
       
    90.  
      def move_to_gap(slider, tracks):
    91.  
      """
    92.  
      拖动滑块到缺口处
    93.  
      :param slider: 滑块
    94.  
      :param track: 轨迹
    95.  
      :return:
    96.  
      """
    97.  
      ActionChains(driver).click_and_hold(slider).perform()
    98.  
       
    99.  
      # 往后移动
    100.  
      for i in tracks['forward_tracks']:
    101.  
      ActionChains(driver).move_by_offset(i, 0).perform()
    102.  
       
    103.  
      # 往回移动
    104.  
      time.sleep(0.5)
    105.  
      for i in tracks['back_tracks']:
    106.  
      ActionChains(driver).move_by_offset(i, 0).perform()
    107.  
       
    108.  
      # 小范围震荡一下
    109.  
      # time.sleep(0.3)
    110.  
      random_sc = random.randint(3, 8)
    111.  
      ActionChains(driver).move_by_offset(0-random_sc, 0).perform()
    112.  
      time.sleep(0.5)
    113.  
      ActionChains(driver).move_by_offset(random_sc, 0).perform()
    114.  
       
    115.  
      # 释放
    116.  
      time.sleep(0.5)
    117.  
      ActionChains(driver).release().perform()
    118.  
       
    119.  
      def crack(retry=0):
    120.  
      # 输入用户名密码
    121.  
      # 点击验证按钮
    122.  
      # 获取验证码图片
    123.  
      print('get_geetest_image')
    124.  
      captcha_obj, captcha_file_name = get_geetest_image('2')
    125.  
      gap = get_gap(captcha_file_name)
    126.  
      l.info('缺口位置:{}'.format(gap))
    127.  
      print('缺口位置:{}'.format(gap))
    128.  
      # 减去起始缺口位移
    129.  
      BORDER = 29
    130.  
      gap -= BORDER
    131.  
      # 获取移动轨迹
    132.  
      track = calculate_tracks(gap)
    133.  
      l.info('滑动轨迹:{}'.format(track))
    134.  
      print('滑动轨迹:{}'.format(track))
    135.  
      # # 拖动滑块
    136.  
      slider = get_slider()
    137.  
      move_to_gap(slider, track)
    138.  
      driver.save_screenshot('./zhilian_capresult_{}_{}.png'.format(self.account['user_id'], retry))
    139.  
      #
    140.  
      time.sleep(3)
    141.  
      # #
    142.  
      result = driver.find_element_by_xpath('//div[@class="geetest_result_title"]').get_attribute('textContent')
    143.  
      l.info(result)
    144.  
      print(result)
    145.  
      return result
    146.  
       
    147.  
      retry = 1
    148.  
      while True:
    149.  
      l.info(f'{retry}/{max_retry} crack geetest.')
    150.  
      if retry == max_retry:
    151.  
      l.info("max retry reached, return False")
    152.  
      return False
    153.  
      success = crack(retry)
    154.  
      if '秒的速度超过' in success or 'passport.lagou.com/login/login' not in driver.current_url:
    155.  
      l.info("crack succeeded!")
    156.  
      print("crack succeeded!")
    157.  
      return True
    158.  
      elif '拖动滑块将悬浮图像正确拼合' in success:
    159.  
      retry += 1
    160.  
      l.info("crack failed, retry:{}/{}".format(retry, max_retry))
    161.  
      driver.find_element_by_xpath('//a[@class="geetest_refresh_1"]').click()
    162.  
      time.sleep(5)
    163.  
      continue
    164.  
      else:
    165.  
      time.sleep(5)
    166.  
      retry += 1
    167.  
      l.info("crack failed, retry:{}/{}".format(retry, max_retry))
    168.  
      continue

    来源:https://blog.csdn.net/wenq_yang/article/details/81258932

  • 相关阅读:
    字典生成式
    三元表达式
    迭代器
    装饰器
    闭包函数
    名称空间和作用域
    函数嵌套
    SQL Server 影响dbcc checkdb的 8 种因素
    SQL Server 对dbcc checkdb的优化
    SQL Server dbcc checkdb 修复
  • 原文地址:https://www.cnblogs.com/alex-13/p/12019764.html
Copyright © 2011-2022 走看看