zoukankan      html  css  js  c++  java
  • python 代理的使用

    这里分享一个测试ip的网址     http://ip.filefab.com/index.php

    scrapy 随机请求头和代理ip的使用原理

    import random
    
    # 添加一个中间键 class User_AgentMiddleware(object): def __init__(self): self.user_agent = [ "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1", "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24", ]
       # scrapy抓取前修改请求头 def process_request(self, request, spider): # 添加代理 request.meta['proxy'] = 'http://119.42.70.216:8080' # 这里不区分 http还是https代理 # 添加随机请求头 ua = random.choice(self.user_agent) request.headers['User-Agent'] = ua # 默认返回None, 继续执行下一步操作 # Response: 直接返回(没有经过process_response, 和process_exception) # Request: 重新进行调用本次process_request的请求 # IgnoreRequest:
      # 判断应抓到网页的状态 def process_response(self, request, response, spider): print(response.headers) if response.status != 200: return request else: return response # 必须有返回值 # request: 重新进行调用本次process_request的请求 # response:返回源码 # IgnoreRequest: 抛出异常, 则Request的errorback()方法被回调。 如异常没被处理, 则会被忽略

    request中添加代理

    # 首先确认代理的类型 是http还是https的来判断proxie的取值
    
    proxie = {
        "http": "http://113.71.211.184:8197",
    }
    
    
    respons = requests.get('http://ip.filefab.com/index.php', proxies=proxie)
    doc = etree.HTML(respons.text)
    print(doc.xpath('.//h1[@id="ipd"]/span/text()'))
  • 相关阅读:
    使用EFCore连接现有数据库
    C#面试题总结
    xamarin学习--发布apk安装包
    xamarin学习--导航参数注意事项
    centos8 安装 gitlab
    mvc添加全局过滤器
    Windows平台查看端口占用情况
    asp.net core cli---创建一个不启用https的项目
    asp.net core cli
    启动nuxt项目报错WARN node unsupported "node@v8.9.3" is incompatible with chalk@^4.1.0, expec...
  • 原文地址:https://www.cnblogs.com/yijian001/p/9015977.html
Copyright © 2011-2022 走看看