zoukankan      html  css  js  c++  java
  • Scrapy下载器中间件实现随机请求头和代理ip

    一、设置随机请求头

    class UAMiddleWare(object):
        UA_LIST = [
            'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50',
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1',
            'Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11',
            'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)'
    
        ]
        def process_request(self,request,spider):
            user_agent = random.choice(self.UA_LIST)
            request.headers['User-Agent'] = user_agent
    

    二、设置随机代理ip(开放代理)

    class IPMiddleWare(object):
        PROXIES = ['http://121.123.32.1:8080','http://122.21.32.2:8000','http://221.32.123.321:8080']
        
        def process_request(self,request,spider):
            proxy = random.choice(self.PROXIES)
            request.meta['proxy'] = proxy
    

    三、设置独享代理

    import base64
    class IPduxiang(object):
        def process_request(self,request,spider):
            proxy = '123.32.12.3:16861'#独享代理的IP地址
            account_password = 'qishuai@juan-juan.com:12342332'
            request.meta['proxy'] = proxy
            #base64.b64encode('转换为字节型')
            b64_password = base64.b64encode(account_password.encode('utf-8'))
            #需要设置请求头   'Basic' + 转换为字符串
            request.headers['Proxy-Authorization'] = 'Basic'+b64_password.decode('utf-8')
    # 相比开放代理池,独享代理需要将用户名和密码进行base64加密再传入请求头中
    
  • 相关阅读:
    使用Ansible安装部署nginx+php+mysql之安装php(2)
    使用Ansible安装部署nginx+php+mysql之安装nginx(1)
    Ansible常见问题处理
    4.2、Ansible常用模块
    3.2、Ansible单命令测试
    2、Ansible配置文件详解
    4.1、Ansible模块
    3.3、Ansible命令参数详解
    3.1、Ansible命令简要说明及初步使用
    1、Ansible初识简要介绍及安装
  • 原文地址:https://www.cnblogs.com/FuckSpider/p/11534701.html
Copyright © 2011-2022 走看看