zoukankan      html  css  js  c++  java
  • 爬虫设置代理ua

    User Agent 介绍

    User Agent 的本质

      一个特殊字符串头

    User Agent 的作用

      使得服务器能够识别客户使用的操作系统及版本、CPU 类型、浏览器及版本、浏览器渲染引擎、浏览器语言、浏览器插件等

    查看浏览器 UA 的方法

     

    查看 Scrapy 爬虫 UA 的方法

      scrapy shell 网址

    Scrapy 设置随机 UA

    编写 UserAgentMiddleware 类

    建立 user-agent 池

      在每次发送 request 之前从 agent 池中随机选取一项设置 request 的 User-Agent

     1 class UserAgentMiddleware(object):
     2     def __init__(self):
     3         self.user_agent_list = [
     4              "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 "
     5             "(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
     6             "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 "
     7             "(KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
     8             "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 "
     9             "(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
    10             "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 "
    11             "(KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
    12             "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 "
    13             "(KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
    14             "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 "
    15             "(KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
    16             "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 "
    17             "(KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
    18             "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
    19             "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
    20             "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 "
    21             "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
    22             "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 "
    23             "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
    24             "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
    25             "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
    26             "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
    27             "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
    28             "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
    29             "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    30             "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
    31             "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    32             "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 "
    33             "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    34             "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
    35             "(KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
    36             "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 "
    37             "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
    38             "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 "
    39             "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
    40         ]
    41 
    42     def process_request(self, request, spider):
    43         user_agent = random.choice(self.user_agent_list)
    44         request.headers['User-Agent'] = user_agent
    45         print('user_agent:%s'%user_agent)

     

    设置 settings.py 文件设置相应的配置和属性

      在 DOWNLOADER_MIDDLEWARES 下配置中间件

     

    1 DOWNLOADER_MIDDLEWARES = {
    2    # 'sohu.middlewares.SohuDownloaderMiddleware': 543,
    3     'sohu.middlewares.UserAgentMiddleware': 300,
    4 }

     

  • 相关阅读:
    matplotlib数据可视化之柱形图
    xpath排坑记
    Leetcode 100. 相同的树
    Leetcode 173. 二叉搜索树迭代器
    Leetcode 199. 二叉树的右视图
    Leetcode 102. 二叉树的层次遍历
    Leetcode 96. 不同的二叉搜索树
    Leetcode 700. 二叉搜索树中的搜索
    Leetcode 2. Add Two Numbers
    Leetcode 235. Lowest Common Ancestor of a Binary Search Tree
  • 原文地址:https://www.cnblogs.com/JinZL/p/11746815.html
Copyright © 2011-2022 走看看