zoukankan      html  css  js  c++  java
  • 爬虫代理

    使用代理

    1.创建代理文件proxies.py在项目目录

    import random
    import base64
    import six
    
    def to_bytes(text, encoding=None, errors='strict'):
        if isinstance(text, bytes):
            return text
        if not isinstance(text, six.string_types):
            raise TypeError('to_bytes must receive a unicode, str or bytes '
                            'object, got %s' % type(text).__name__)
        if encoding is None:
            encoding = 'utf-8'
        return text.encode(encoding, errors)
    
    
    class ProxyMiddleware(object):
        def process_request(self, request, spider):
            PROXIES = [
                {'ip_port': '111.11.228.75:80', 'user_pass': ''},
                {'ip_port': '120.198.243.22:80', 'user_pass': ''},
                {'ip_port': '111.8.60.9:8123', 'user_pass': ''},
                {'ip_port': '101.71.27.120:80', 'user_pass': ''},
                {'ip_port': '122.96.59.104:80', 'user_pass': ''},
                {'ip_port': '122.224.249.122:8088', 'user_pass': ''},
            ]
    
            proxy = random.choice(PROXIES)
            if proxy['user_pass'] is not None:
                request.meta['proxy'] = to_bytes("http://%s" % proxy['ip_port'])
                encoded_user_pass = base64.encodestring(to_bytes(proxy['user_pass']))
                request.headers['Proxy-Authorization'] = to_bytes('Basic ' + encoded_user_pass)
    
            else:
                request.meta['proxy'] = to_bytes("http://%s" % proxy['ip_port'])
    

    2.在settings.py中指定代理文件

    DOWNLOADER_MIDDLEWARES = {
       'sp1.proxies.ProxyMiddleware': 600
    }
    
  • 相关阅读:
    CSS 权威指南 CSS实战手册 第四版(阅读笔记)
    iframe交互(一)父页面自动高度
    连接微服务
    学习SQLYog
    sourceTree的安装以及破解
    sql 根据子级ID获取所有父级
    新手Python入门安装(一)
    C# 真正完美的 汉字转拼音
    供应链相关的书和博客
    网易跟帖为什么火
  • 原文地址:https://www.cnblogs.com/baolin2200/p/8563626.html
Copyright © 2011-2022 走看看