zoukankan      html  css  js  c++  java
  • Scrapy框架中的 UA伪装

    例如:百度输入ip查看是自己本机的ip,通过UA伪装成其他机器的ip,

    爬虫代码:

     1 import scrapy
     2 
     3 
     4 class UatestSpider(scrapy.Spider):
     5     name = 'UATest'
     6     # allowed_domains = ['www.xxx.com']
     7     start_urls = ['https://www.baidu.com/s?wd=ip']
     8     def parse(self, response):
     9         with open('./ip.html','w',encoding='utf-8')as fp:
    10             fp.write(response.text)
    11             print('over!!!')
    爬虫代码

    Middlewares中间件代码:

     1 from scrapy import signals
     2 from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware
     3 import  random
     4 user_agent_list = [
     5         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 "
     6         "(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
     7         "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 "
     8         "(KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
     9         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 "
    10         "(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
    11         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 "
    12         "(KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
    13         "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 "
    14         "(KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
    15         "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 "
    16         "(KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
    17         "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 "
    18         "(KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
    19         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
    20         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
    21         "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 "
    22         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
    23         "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 "
    24         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
    25         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
    26         "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
    27         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
    28         "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
    29         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
    30         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    31         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
    32         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    33         "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 "
    34         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
    35         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
    36         "(KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
    37         "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 "
    38         "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
    39         "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 "
    40         "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
    41 ]
    42 
    43 class UAPool(UserAgentMiddleware):
    44     def process_request(self,request,spider):
    45         ua=random.choice(user_agent_list)
    46         request.headers['User-Agent']=ua
    47         print(request.headers['User-Agent'])
    48 
    49 proxy_http = ['125.27.10.150:56292','114.34.168.157:46160']
    50 proxy_https = ['1.20.101.81:35454','113.78.254.156:9000']
    51 class UapoolDownloaderMiddleware(object):
    52     #request参数就是拦截到的 请求对象
    53     #spider就是爬虫对象
    54     def process_request(self, request, spider):
    55         if request.url.split(':')[0]=='https':
    56             request.meta['proxy']='https://'+random.choice(proxy_https)
    57         else:
    58             request.meta['proxy'] = 'http://' + random.choice(proxy_http)
    59         print(request.meta['proxy'])
    60         return None
    middlewares

    注:setting需要解开中间件,并添加自己写的中间件类

  • 相关阅读:
    初学C++到底应该用什么工具比较合适——工具简析
    便携式办公套件LibreOffice Portable 4.0.1
    Hibernate和JPA之间的联系
    央视《家有妙招》整理版,共250招,值得永远收藏
    思科Vs华为:不可避免的对决
    Facebook手机刺激了谁?
    Facebook利用Home平台加速进军移动领域
    英特尔Haswell处理器已出货 预计6月推出
    苹果自建街景地图 或与谷歌针锋相对
    图片链接
  • 原文地址:https://www.cnblogs.com/duanhaoxin/p/10138809.html
Copyright © 2011-2022 走看看