zoukankan      html  css  js  c++  java
  • UA池和代理池

    二.UA池:User-Agent池

    - 作用:尽可能多的将scrapy工程中的请求伪装成不同类型的浏览器身份。

    - 操作流程:

        1.在下载中间件中拦截请求

        2.将拦截到的请求的请求头信息中的UA进行篡改伪装

        3.在配置文件中开启下载中间件

    代码展示:

    #导包
    from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware
    import random
    #UA池代码的编写(单独给UA池封装一个下载中间件的一个类)
    class RandomUserAgent(UserAgentMiddleware):
    
        def process_request(self, request, spider):
            #从列表中随机抽选出一个ua值
            ua = random.choice(user_agent_list)
            #ua值进行当前拦截到请求的ua的写入操作
            request.headers.setdefault('User-Agent',ua)
    
    
    user_agent_list = [
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 "
            "(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
            "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 "
            "(KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 "
            "(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 "
            "(KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 "
            "(KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 "
            "(KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
            "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 "
            "(KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
            "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 "
            "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 "
            "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
            "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
            "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
            "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
            "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 "
            "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
            "(KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 "
            "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 "
            "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
    ]
    

    三.代理池

    - 作用:尽可能多的将scrapy工程中的请求的IP设置成不同的。

    - 操作流程:

        1.在下载中间件中拦截请求

        2.将拦截到的请求的IP修改成某一代理IP

        3.在配置文件中开启下载中间件

    代码展示:

    #批量对拦截到的请求进行ip更换
    #单独封装下载中间件类
    class Proxy(object):
        def process_request(self, request, spider):
            #对拦截到请求的url进行判断(协议头到底是http还是https)
            #request.url返回值:http://www.xxx.com
            h = request.url.split(':')[0]  #请求的协议头
            if h == 'https':
                ip = random.choice(PROXY_https)
                request.meta['proxy'] = 'https://'+ip
            else:
                ip = random.choice(PROXY_http)
                request.meta['proxy'] = 'http://' + ip
    
    #可被选用的代理IP
    PROXY_http = [
        '153.180.102.104:80',
        '195.208.131.189:56055',
    ]
    PROXY_https = [
        '120.83.49.90:9000',
        '95.189.112.214:35508',
    ]
  • 相关阅读:
    tar命令,vi编辑器
    Linux命令、权限
    Color Transfer between Images code实现
    利用Eclipse使用Java OpenCV(Using OpenCV Java with Eclipse)
    Matrix Factorization SVD 矩阵分解
    ZOJ Problem Set
    Machine Learning
    ZOJ Problem Set
    ZOJ Problem Set
    ZOJ Problem Set
  • 原文地址:https://www.cnblogs.com/plyc/p/14506058.html
Copyright © 2011-2022 走看看