zoukankan      html  css  js  c++  java
  • scrapy user-agent随机更换

    user-agent大全页面:

    https://fake-useragent.herokuapp.com/browsers/0.1.6

    使用fake-useragent模块

    模块github地址:https://github.com/hellysmile/fake-useragent

    安装方法:

    pip install fake-useragent

    使用方法:

    from fake_useragent import UserAgent
    ua = UserAgent()
    
    ua.ie
    # Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US);
    ua.msie
    # Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)'
    ua['Internet Explorer']
    # Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)
    ua.opera
    # Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11
    ua.chrome
    # Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
    ua.google
    # Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13
    ua['google chrome']
    # Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11
    ua.firefox
    # Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1
    ua.ff
    # Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1
    ua.safari
    # Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25
    
    # and the best one, random via real world browser usage statistic
    ua.random
    View Code

    scrapy 自动切换方法:

    编辑middlewares.py文件

    from fake_useragent import UserAgent
    
    # 新增一个类
    class RandomUserAgentMidddlware(object):
        # 随机更换user-agent
        def __init__(self, crawler):
            super(RandomUserAgentMidddlware, self).__init__()
            self.ua = UserAgent()
            # 从配置文件读取随机类型
            self.ua_type = crawler.settings.get('RANDOM_UA_TYPE', 'random')
    
        @classmethod
        def from_crawler(cls, crawler):
            return cls(crawler)
    
        def process_request(self, request, spider):
            # 通过配置文件的随机类型进行调用
            def get_ua():
                return getattr(self.ua, self.ua_type)
    
            request.headers.setdefault('User-Agent', get_ua())    

    settings文件

    DOWNLOADER_MIDDLEWARES = {
       # 将自己编写的类导入
       'ArticleSpider.middlewares.RandomUserAgentMidddlware': 543,
        # 导入系统自带的useragent类,并将优先级设置为none
        'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    }
    
    # 设置自定义的random的属性
    RANDOM_UA_TYPE = "random"
  • 相关阅读:
    Note:《Microsoft Windows Workflow Foundation 入门:开发人员演练》
    泛型集合类型,赋予集合业务意义,增强集合的抽象使用
    IIS7.0 for developer
    【代码保留】成对值类(PairCollection和Pair
    《SOA中国路线图》下载
    【代码保留】Quarter类
    复合控件和事件(6)——一点优化
    全方位掌握 NSIS 的使用[转]
    HTML Entities Examples
    如何对Outlook添加右键菜单
  • 原文地址:https://www.cnblogs.com/trunkslisa/p/9841658.html
Copyright © 2011-2022 走看看