zoukankan      html  css  js  c++  java
  • Python + Selenium + Chrome 使用代理 auth 的用户名密码授权

    米扑代理,全球领导的代理品牌,专注代理行业近十年,提供开放、私密、独享代理,并可免费试用

    米扑代理官网:https://proxy.mimvp.com

    本文示例,是结合米扑代理的私密、独享、开放代理,专门研发的示例,

    支持 http、https的无密码、白名单ip、密码授权三种类型

    示例中,用的插件 xpi 请到米扑代理官网,或米扑官方 github 下载

    本文,直接给出完整的代码,都经过严格验证通过,具体请见注释

    本文示例的运行环境:

    MacBook Pro  MacOS High Sierra Version 10.13.4

    Google Chrome  Version 63.0.3239.84 (Official Build) (64-bit)

    Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 12:39:47) 

    $ pip list | grep selenium
    selenium (3.4.2)

     

    chromedriver 下载地址:http://chromedriver.storage.googleapis.com/index.html

     

    Python + Selenium + Chrome

    出错提示:WebDriverException: 'chromedriver' executable needs to be in PATH

    解决方法:

    a. 下载 ChromeDriver,其它浏览器参见官网说明

    b. 复制 chromedrive 文件到 Google Chrome 程序目录下,或复制到环境变量下

    cp chromedrive /usr/local/bin/

    各操作系统里的位置路径可以参考官方Wiki

    Python 代码里创建 webdriver 对象时传递 chromedrive 路径

    示例1:MacOS + chrome 环境

    chromedriver = "/Applications/Google Chrome.app/Contents/MacOS/chromedriver" 
    browser = webdriver.Chrome(executable_path=chromedriver)        # 打开 Chrome 浏览器
    browser.get(url)     
    content = browser.page_source
    print("content: " + str(content))
    

    示例2:MacOS + 环境变量

    def spider_url_chrome(url):
        browser = None
        display = None
        try:
            display = Display(visible=0, size=(800, 600))
            display.start()
            chromedriver = '/usr/local/bin/chromedriver'
            browser = webdriver.Chrome(executable_path=chromedriver)        # 打开 Chrome 浏览器
            browser.get(url)     
            content = browser.page_source
            print("content: " + str(content))
        finally:
            if browser: browser.quit()
            if display: display.stop()

    Selenium + chromedriver 代理使用,无密码或已设置白名单ip

    ## webdriver + chrome + proxy + whiteip (无密码,或白名单ip授权)
    ## 米扑代理:https://proxy.mimvp.com
    def spider_url_chrome_by_whiteip(url):
        browser = None
        display = None
        
        ## 白名单ip,请见米扑代理会员中心: https://proxy.mimvp.com/usercenter/userinfo.php?p=whiteip
        mimvp_proxy = { 
                        'ip'            : '140.143.62.84',      # ip
                        'port_https'    : 62288,                # http, https
                        'port_socks'    : 62287,                # socks5
                        'username'      : 'mimvp-user',
                        'password'      : 'mimvp-pass'
                      }
        
        try:
            display = Display(visible=0, size=(800, 600))
            display.start()
            
            chrome_options = Options()                      # ok
            chrome_options = webdriver.ChromeOptions()      # ok
            proxy_https_argument = '--proxy-server=http://{ip}:{port}'.format(ip=mimvp_proxy['ip'], port=mimvp_proxy['port_https'])     # http, https (无密码,或白名单ip授权,成功)
            chrome_options.add_argument(proxy_https_argument)
    #         proxy_socks_argument = '--proxy-server=socks5://{ip}:{port}'.format(ip=mimvp_proxy['ip'], port=mimvp_proxy['port_socks'])   # socks5 (无密码,或白名单ip授权,失败)
    #         chrome_options.add_argument(proxy_socks_argument)
            
            chromedriver = '/usr/local/bin/chromedriver'
            browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options)        # 打开 Chrome 浏览器
            browser.get(url)     
            content = browser.page_source
            print("content: " + str(content))
        finally:
            if browser: browser.quit()
            if display: display.stop()

    Selenium + chromedriver 代理使用,支持http、https账号密码

    本示例,采用了米扑代理的用户名密码授权

    获取户名密码授权,请到米扑代理 - 会员中心 - 白名单ip

    1、创建一个zip包,包含以下两个文件 background.js 和 manifest.json,打包成 proxy.zip

    1)background.js

    var config = {
        mode: "fixed_servers",
        rules: {
          singleProxy: {
            scheme: "http",
            host: "140.143.62.84",
            port: 19480
          },
          bypassList: ["mimvp.com"]
        }
      };
    
    chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
    
    function callbackFn(details) {
        return {
            authCredentials: {
                username: "mimvp-user",
                password: "mimvp-pass"
            }
        };
    }
    
    chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
    );

    注意:上面配置中,需要把代理ip、port、username、password 替换成米扑代理的ip:port、授权用户名和密码

    2)manifest.json

    {
        "version": "1.0.0",
        "manifest_version": 2,
        "name": "Chrome Proxy",
        "permissions": [
            "proxy",
            "tabs",
            "unlimitedStorage",
            "storage",
            "<all_urls>",
            "webRequest",
            "webRequestBlocking"
        ],
        "background": {
            "scripts": ["background.js"]
        },
        "minimum_chrome_version":"22.0.0"
    }

    说明:上面配置,不需要改动,直接拷贝使用即可

    2、添加 proxy.zip 到 chrome 中作为插件

    #!/usr/bin/env python
    # -*- coding:utf-8 -*-
    
    from selenium import webdriver
    from selenium.webdriver.common.proxy import *
    from selenium.webdriver.chrome.options import Options
    from pyvirtualdisplay import Display
    # from xvfbwrapper import Xvfb
    
    
    def spider_url_chrome_by_https(url):
        browser = None
        display = None
        try:
            display = Display(visible=0, size=(800, 600))
            display.start()
            
            chrome_options = Options()
            chrome_options.add_extension("proxy.zip")
            
            chromedriver = '/usr/local/bin/chromedriver'
            browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options)        # 打开 Chrome 浏览器
            browser.get(url)     
            content = browser.page_source
            print("content: " + str(content))
        finally:
            if browser: browser.quit()
            if display: display.stop()
    
    
    if __name__ == '__main__':
        url = 'https://ip.cn'
        url = 'https://mimvp.com/'
        url = 'https://proxy.mimvp.com/ip.php'
    
        # http, https 密码授权,成功
        spider_url_chrome_by_https(url)

    3、运行效果,验证成功

    content: <html xmlns="http://www.w3.org/1999/xhtml"><head></head><body>140.143.62.84</body></html>

    Selenium + Chrome Diver使用用户名密码认证的HTTP代理的方法 (升级版

    默认情况下,Chrome的--proxy-server="http://ip:port"参数不支持设置用户名和密码认证。

    这样就使得"Selenium + Chrome Driver"无法使用HTTP Basic Authentication的HTTP代理。

    一种变通的方式就是采用IP地址认证,米扑代理提供白名单ip授权,即属于IP地址认证,详见米扑代理 - 会员中心 - 白名单ip

    但在国内网络环境下,大多数用户都采用ADSL形式网络接入,IP是变化的(ISP动态切换),因此无法采用IP地址绑定认证。

    因此,迫切需要找到一种让Chrome自动实现HTTP代理用户名密码认证的方案。

    Stackoverflow上有人分享了一种利用 Chrome插件 实现自动代理用户密码认证的方案非常不错,

    详细地址:how-to-override-basic-authentication-in-selenium2-with-java-using-chrome-driver

    米扑代理的研发工程师,在该思路的基础上用Python实现了自动化的Chrome插件创建过程,

    即根据指定的代理“username:password@ip:port”实现了自动创建一个Chrome代理插件,

    然后就可以在"Selenium + Chrome Driver"中通过安装该插件实现代理配置功能,

    具体代码如下:

    1、创建模板文件夹 Chrome-proxy-helper

    如上图结构,依次创建:

    1)创建模板文件夹 

    Chrome-proxy-helper

    2)创建 background.js

    vim Chrome-proxy-helper/background.js

    var config = {
        mode: "fixed_servers",
        rules: {
          singleProxy: {
            scheme: "http",
            host: "mimvp_proxy_host",
            port: parseInt(mimvp_proxy_port)
          },
          bypassList: ["mimvp.com"]
        }
      };
    
    chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
    
    function callbackFn(details) {
        return {
            authCredentials: {
                username: "mimvp_username",
                password: "mimvp_password"
            }
        };
    }
    
    chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
    );

    3)创建 manifest.json

    vim Chrome-proxy-helper/manifest.json

    {
        "version": "1.0.0",
        "manifest_version": 2,
        "name": "Chrome Proxy",
        "permissions": [
            "proxy",
            "tabs",
            "unlimitedStorage",
            "storage",
            "<all_urls>",
            "webRequest",
            "webRequestBlocking"
        ],
        "background": {
            "scripts": ["background.js"]
        },
        "minimum_chrome_version":"22.0.0"
    }

    2、创建zip打包的函数

    在 python 脚本里,创建zip打包的函数

    import os, re, time, zipfile
    from selenium import webdriver
    
    def get_chrome_proxy_extension(proxy):
        """获取一个Chrome代理扩展,里面配置有指定的代理(带用户名密码认证)
            proxy - 指定的代理,格式: username:password@ip:port
        """
        
        # Chrome代理插件的参考模板 https://github.com/RobinDev/Selenium-Chrome-HTTP-Private-Proxy
        CHROME_PROXY_HELPER_DIR = 'Chrome-proxy-helper'     # 自定义目录名,放在代理项目的当前同一级目录
        
        # 存储自定义Chrome代理扩展文件的目录,一般为当前同一级目录
        # 生成的zip路径为:chrome-proxy-extensions/mimvp-user_mimvp-pass@140.143.62.84_19480.zip
        CUSTOM_CHROME_PROXY_EXTENSIONS_DIR = 'chrome-proxy-extensions'  
    
        m = re.compile('([^:]+):([^@]+)@([d.]+):(d+)').search(proxy)
        if m:
            # 提取代理的各项参数
            username = m.groups()[0]
            password = m.groups()[1]
            ip = m.groups()[2]
            port = m.groups()[3]
            # 创建一个定制Chrome代理扩展(zip文件)
            if not os.path.exists(CUSTOM_CHROME_PROXY_EXTENSIONS_DIR):
                os.mkdir(CUSTOM_CHROME_PROXY_EXTENSIONS_DIR)
            extension_file_path = os.path.join(CUSTOM_CHROME_PROXY_EXTENSIONS_DIR, '{}.zip'.format(proxy.replace(':', '_')))
            
            # 扩展文件不存在,创建
            if not os.path.exists(extension_file_path):
                zf = zipfile.ZipFile(extension_file_path, mode='w')
                zf.write(os.path.join(CHROME_PROXY_HELPER_DIR, 'manifest.json'), 'manifest.json')
                # 替换模板中的代理参数
                background_content = open(os.path.join(CHROME_PROXY_HELPER_DIR, 'background.js')).read()
                background_content = background_content.replace('mimvp_proxy_host', ip)
                background_content = background_content.replace('mimvp_proxy_port', port)
                background_content = background_content.replace('mimvp_username', username)
                background_content = background_content.replace('mimvp_password', password)
                zf.writestr('background.js', background_content)
                zf.close()
            return extension_file_path
        else:
            raise Exception('Invalid proxy format. Should be username:password@ip:port')

    3、编写 Python 脚本的使用代理函数

    ## webdriver + chrome + proxy + https (https密码授权,自动打包zip)
    ## 米扑代理:https://proxy.mimvp.com
    def spider_url_chrome_by_https2(url):
        browser = None
        display = None
        try:
            display = Display(visible=0, size=(800, 600))
            display.start()
            
            proxy = 'mimvp-guest:welcome2mimvp@140.143.62.84:19480'
            chrome_options = Options()
            chrome_options = webdriver.ChromeOptions()
            chrome_options.add_extension(get_chrome_proxy_extension(proxy))
            
            chromedriver = '/usr/local/bin/chromedriver'
            browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options)        # 打开 Chrome 浏览器
            browser.get(url)     
            content = browser.page_source
            print("content: " + str(content))
        finally:
            if browser: browser.quit()
            if display: display.stop()
    
    
    
    if __name__ == '__main__':
        url = 'https://ip.cn'
        url = 'https://mimvp.com/'
        url = 'https://proxy.mimvp.com/ip.php'
    
        # http, https 密码授权,成功
        spider_url_chrome_by_https2(url)

    4、运行结果,验证成功

    content: <html xmlns="http://www.w3.org/1999/xhtml"><head></head><body>140.143.62.84</body></html>

    5、小结

    通过模板,使用脚本自动创建zip文件,实现了自动动态调用代理,可以充分灵活运用米扑代理了

    Selenium + chromedriver 代理使用,不支持 socks5,米扑实测不成功

    ## webdriver + chrome + proxy + socks (socks密码授权)
    ## 米扑代理:https://proxy.mimvp.com
    def spider_url_chrome_by_socks(url):
        browser = None
        display = None
        
        ## 白名单ip,请见米扑代理会员中心: https://proxy.mimvp.com/usercenter/userinfo.php?p=whiteip
        mimvp_proxy = { 
                        'ip'            : '140.143.62.84',      # ip
                        'port_https'    : 62288,                # http, https
                        'port_socks'    : 62289,                # socks5
                        'username'      : 'mimvp-user',
                        'password'      : 'mimvp-pass'
                      }
        
        try:
            display = Display(visible=0, size=(800, 600))
            display.start()
            
            capabilities = dict(DesiredCapabilities.CHROME)
            capabilities['proxy'] = {
                                        'proxyType'    : 'MANUAL',
    #                                     'httpProxy'    : mimvp_proxy['ip'] + ":" + str(mimvp_proxy['port_https']),
    #                                     'sslProxy'     : mimvp_proxy['ip'] + ":" + str(mimvp_proxy['port_https']),
                                        'socksProxy'   : mimvp_proxy['ip'] + ":" + str(mimvp_proxy['port_socks']),
                                        'ftpProxy'     : mimvp_proxy['ip'] + ":" + str(mimvp_proxy['port_https']),
                                        'noProxy'      : 'localhost,127.0.0.1',
                                        'class'        : "org.openqa.selenium.Proxy",
                                        'autodetect'   : False
                                    }
            
            capabilities['proxy']['socksUsername'] = mimvp_proxy['username']
            capabilities['proxy']['socksPassword'] = mimvp_proxy['password']
            
            chromedriver = '/usr/local/bin/chromedriver'
            browser = webdriver.Chrome(chromedriver, desired_capabilities=capabilities)
            browser.get(url)     
            content = browser.page_source
            print("content: " + str(content))
        finally:
            if browser: browser.quit()
            if display: display.stop()

    完整的代理示例,请见米扑代理的使用示例:

    https://proxy.mimvp.com/demo2.php  (Selenium Python)

    更多的代理示例,请见米扑代理的官方github:

    https://github.com/mimvp/mimvp-proxy-demo

    本文中,测试的代理ip,全部来自米扑代理:

    https://proxy.mimvp.com

    附加说明:

    Chrome-proxy-helper 有官方版:

    https://github.com/sunboy-2050/Chrome-proxy-helper

    Introduction

    By default, Chrome use the system proxy setting (IE proxy settings on Windows platform ), but sometime we want to set proxy ONLY for chrome, not the whole system.

    Chrome proxy helper extension use Chrome native proxy API to set proxy, support socks5, socks4, http and https protocol and pac script, Fast And Simple.

    Features

    • support socks4, socks5, http, https proxy settings
    • support pac proxy settings
    • support bypass list
    • support online pac script
    • support customer proxy rules
    • support proxy authentication
    • support extension settings synchronize
  • 相关阅读:
    java web 开发 IDE 下载地址
    【转】简述TCP的三次握手过程
    【转】TCP、UDP数据包大小的限制
    复习笔记2018.8.3
    .NET和UNITY版本问题
    LUA全总结
    C++全总结
    C# 全总结
    #region 常量和静态变量静态类readonly
    //todo 的用处
  • 原文地址:https://www.cnblogs.com/ithomer/p/9327052.html
Copyright © 2011-2022 走看看