zoukankan      html  css  js  c++  java
  • Python 爬虫之设置ip代理,设置User-Agent,设置请求头,设置post载荷

    1、get方式:如何为爬虫添加ip代理,设置Request header(请求头)

    import urllib 
    import urllib.request
    import urllib.parse
    import random
    import time
    from fake_useragent import UserAgent
    ua = UserAgent()
    url = "http://www.baidu.com"
    ########################################################
    '''
    设置ip代理
    iplist = [ '127.0.0.1:80']   #可自行上网找一些代理
    proxy_support = urllib.request.ProxyHandler({'http':random.choice(iplist)})  #也可以设置为https,要看你的代理支不支持
    opener = urllib.request.build_opener(proxy_support)
    '''
    ########################################################
    '''无ip代理'''
    opener = urllib.request.build_opener()
    
    '''f12查看请求头添加即可,不一定都需要全添加↓↓↓'''
    opener.addheaders = [('Host', 'newtab.firefoxchina.cn'),
                         ('User-Agent',ua.random),
                         ('Accept-Encoding','deflate, br'),
                         ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'),
                         ('Accept-Language', 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2'),
                         ('Connection', 'keep-alive'),
                         ('Upgrade-Insecure-Requests',1),
                         ('Cookie', '__gads=ID=138080209be66bf8:T=1592037395:S=ALNI_Ma-g9wHmfxFL4GCy9veAjJrJRsNmg; Hm_lvt_dd4738b5fb302cb062ef19107df5d2e4=1592449208,1592471447,1592471736,1594001802; uid=rBADnV7m04mi8wRJK3xYAg=='),
                        ]
    urllib.request.install_opener(opener)
    while True:
        try:
            response = urllib.request.urlopen(url)
            break
        except Exception as e:
            print("错误信息:" + str(e))
            time.sleep(3)
    html = response.read().decode("utf-8")
    print(html)

    2、post方式添加载荷(此处是打比方),修改urllib.request.install_opener(opener)以下的代码即可

    urllib.request.install_opener(opener)
    # data = {}        #当页面提交数据是有载荷但是载荷内容为空时,必须以data = {}传参,不然无法获取网页数据
    data = {'_csrf':'请把',
            'collection-name':'载荷的参数',
            'description':'以这种形式',
            '_csrf':'装载'
            }
    data = urllib.parse.urlencode(data).encode('utf-8')
    req = urllib.request.Request(url,data)
    while True:
        try:
            response = urllib.request.urlopen(req)
            break
        except Exception as e:
            print("错误信息:" + str(e))
            time.sleep(3)
    html = response.read().decode("utf-8")
  • 相关阅读:
    PNG文件格式具体解释
    opencv2对读书笔记——使用均值漂移算法查找物体
    Jackson的Json转换
    Java实现 蓝桥杯VIP 算法训练 装箱问题
    Java实现 蓝桥杯VIP 算法训练 装箱问题
    Java实现 蓝桥杯VIP 算法训练 单词接龙
    Java实现 蓝桥杯VIP 算法训练 单词接龙
    Java实现 蓝桥杯VIP 算法训练 方格取数
    Java实现 蓝桥杯VIP 算法训练 方格取数
    Java实现 蓝桥杯VIP 算法训练 单词接龙
  • 原文地址:https://www.cnblogs.com/zrzm/p/13332371.html
Copyright © 2011-2022 走看看