zoukankan      html  css  js  c++  java
  • 关于scarpy的一些说明

    一  scrapy添加代理

      1 内置代理:os.environ。

        固定格式,不推荐

    os.environ['http_proxy'] = "http://root:woshiniba@192.168.11.11:9999/"
    os.environ['https_proxy'] = "http://192.168.11.11:9999/"

      2 自定义代理:通过中间件实现

                    import six
                    import random
                    import base64
    
                    from scrapy.contrib.downloadermiddleware.httpproxy import HttpProxyMiddleware
    
    
    
                    def to_bytes(text, encoding=None, errors='strict'):
                        if isinstance(text, bytes):
                            return text
                        if not isinstance(text, six.string_types):
                            raise TypeError('to_bytes must receive a unicode, str or bytes '
                                            'object, got %s' % type(text).__name__)
                        if encoding is None:
                            encoding = 'utf-8'
                        return text.encode(encoding, errors)
    
    
                    class ProxyMiddleware(object):
                        def process_request(self, request, spider):
                            PROXIES = [
                                {'ip_port': '111.11.228.75:80', 'user_pass': ''},
                                {'ip_port': '120.198.243.22:80', 'user_pass': ''},
                                {'ip_port': '111.8.60.9:8123', 'user_pass': ''},
                                {'ip_port': '101.71.27.120:80', 'user_pass': ''},
                                {'ip_port': '122.96.59.104:80', 'user_pass': ''},
                                {'ip_port': '122.224.249.122:8088', 'user_pass': ''},
                            ]
                            proxy = random.choice(PROXIES)
                            if proxy['user_pass'] is not None:
                                request.meta['proxy'] = to_bytes("http://%s" % proxy['ip_port'])
                                encoded_user_pass = base64.encodestring(to_bytes(proxy['user_pass']))
                                request.headers['Proxy-Authorization'] = to_bytes('Basic ' + encoded_user_pass)
                            else:
                                request.meta['proxy'] = to_bytes("http://%s" % proxy['ip_port'])
    
                    
        
                    DOWNLOADER_MIDDLEWARES = {
                       'sp1.proxy.ProxyMiddleware': 666,
                    }
  • 相关阅读:
    scp命令(基于ssh上传文件等)
    mac上安装ruby
    Access denied for user ''@'localhost' to database 'mysql'
    3.ruby语法基础,全部变量,实例变量,类变量,局部变量的使用和注意的要点
    2.ruby基本语法,类的定义
    1.ruby基本格式
    neo4j在linux下的安装
    mongodb导入json文件
    mongodb 安装启动
    Junit4
  • 原文地址:https://www.cnblogs.com/654321cc/p/8955915.html
Copyright © 2011-2022 走看看