zoukankan      html  css  js  c++  java
  • 下载中间件--随机IP代理以及随机User_Agent

    下载中间件随机IP代理以及随机User_Agent

    1.在settings.py中设置开启代理功能

    # 设置下载中间件
    DOWNLOADER_MIDDLEWARES = {
       # 随机的 User-Agent
       'douban.middlewares.DoubanUserAgent': 100,
       # 随机的 Proxy
       'douban.middlewares.DoubanProxy': 200,
    }
    
    # 代理列表值
    # User_Agent 列表
    User_Agent_lists = [
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.87 Safari/537.36 OPR/37.0.2178.32',
        'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0',
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2',
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36',
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586',
        'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko',
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 BIDUBrowser/8.3 Safari/537.36',
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36 Core/1.47.277.400 QQBrowser/9.4.7658.400',
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 UBrowser/5.6.12150.8 Safari/537.36',
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0',
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.154 Safari/537.36 LBBROWSER',
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36 TheWorld 7',
        'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36',
    ]
    
    # 代理列表
    PROXIES = [
        {"ip_port": "210.16.189.75:888", "user_passwd": "fa319:fa319"},
        # ... ...
        # {"ip_prot": "ip:端口", "user_passwd": ""},     # 没有用户名密码的,直接将用户密码留空即可
    ]
    
    # 注释之前设置的 User-Agent 
    # DEFAULT_REQUEST_HEADERS = {
    #   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    #   # 'Accept-Language': 'en',
    #   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36',
    # }
    

    2.编辑middlewares.py 中间件文件

    import random
    import base64
    from scrapy import signals
    from .settings import User_Agent_lists
    from .settings import PROXIES
    
    class DoubanUserAgent(object):
        # process_request必要方法
        def process_request(self, request, spider):
            # 从列表中随机拿取一个 User_Agent 设置为头
            useragent = random.choice(User_Agent_lists)
    
            # 将随机 User_Agent 放入请求中
            request.headers.setdefault("User-Agent", useragent)
    
    
    class DoubanProxy(object):
        # process_request必要方法
        def process_request(self, request, spider):
            proxy = random.choice(PROXIES)
    
            if proxy['user_passwd'] is None:
                # 代理需要写在 request 的 meta 信息中
                request.meta["proxy"] = "http://" + proxy['ip_port']
            else:
                # 对账户进行 base64 的编码转换
                base64_userpasswd = base64.b64encode(proxy["user_passwd"].encode("utf-8")).decode()
    
                # 代理的 IP 地址
                request.meta["proxy"] = "http://" + proxy['ip_port']
    
                # 代理的 用户名密码
                request.headers["Proxy-Authorization"] = 'Basic ' + base64_userpasswd
    
  • 相关阅读:
    C# decimal保留指定的小数位数,不四舍五入
    C# :实现水印与图片合成,并利用Graphics 压缩图像质量 , (委托实现listBox的动态添加提示)
    手机游戏模拟器汇总 用于开发
    WinAPI 操作串口
    C#图片压缩算法
    SQL SERVER 2008 无法启动TSQL调试的解决方法
    C#放缩、截取、合并图片并生成高质量新图的类
    C#图片处理之: 另存为压缩质量可自己控制的JPEG
    URL及short URL短网址
    1的补码及2的补码
  • 原文地址:https://www.cnblogs.com/baolin2200/p/8603567.html
Copyright © 2011-2022 走看看