zoukankan      html  css  js  c++  java
  • 随机IP代理插件Scrapy-Proxies

    安装:

    pip install scrapy_proxies

    github:   https://github.com/aivarsk/scrapy-proxies

    scrapy爬虫配置文件settings.py

    # Retry many times since proxies often fail
    RETRY_TIMES = 10
    # Retry on most error codes since proxies fail for different reasons
    RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408]
    
    DOWNLOADER_MIDDLEWARES = {
        'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
        'scrapy_proxies.RandomProxy': 100,
        'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
    }
    
    # Proxy list containing entries like
    # http://host1:port
    # http://username:password@host2:port
    # http://host3:port
    # 这是存放代理IP列表的位置
    PROXY_LIST = '/path/to/proxy/list.txt'
    
    #代理模式
    # 0 = Every requests have different proxy
    # 1 = Take only one proxy from the list and assign it to every requests
    # 2 = Put a custom proxy to use in the settings
    PROXY_MODE = 0
    
    #如果使用模式2,将下面解除注释:
    #CUSTOM_PROXY = "http://host1:port"
    使用方法:
    
    将之前用Python爬到的代理IP列表存储到PROXY_LIST可以找到的位置;
    几种PROXY_MODE里,可能0是最常用的;如果有哪个IP是特别稳定的话,应该使用2。
  • 相关阅读:
    RTP 控制协议
    非关系型数据库
    关系型数据库
    处理海量数据
    处理大并发
    C++ 模板偏特化-来自STL的思考
    C++详解new/delete
    二分算法来相会
    计算机网络知识点总结
    C++字符串类型转换
  • 原文地址:https://www.cnblogs.com/knighterrant/p/10810261.html
Copyright © 2011-2022 走看看