zoukankan html css js c++ java

python爬虫之headers处理、网络超时问题处理

1、请求headers处理

　　我们有时请求服务器时，无论get或post请求，会出现403错误，这是因为服务器拒绝了你的访问，这时我们可以通过模拟浏览器的头部信息进行访问，这样就可以解决反爬设置的问题。

import requests
# 创建需要爬取网页的地址
url = 'https://www.baidu.com/'     
# 创建头部信息
headers = {'User-Agent':'OW64; rv:59.0) Gecko/20100101 Firefox/59.0'}
# 发送网络请求
response  = requests.get(url, headers=headers)    
# 以字节流形式打印网页源码
print(response.content)

结果：

b'<!DOCTYPE html><!--STATUS OK-->


    
    
                            <html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><meta name="description" content="xe5x85xa8xe7x90x83xe6x9cx80xe5xa4xa7xe7x9ax84xe4xb8xadxe6x96x87xe6x90x9cxe7xb4xa2xe5xbcx95xe6x93x8exe3x80x81xe8x87xb4xe5x8ax9bxe4xbax8exe8xaexa9xe7xbdx91xe6xb0x91xe6x9bxb4xe4xbexbfxe6x8dxb7xe5x9cxb0xe8x8exb7xe5x8fx96xe4xbfxa1xe6x81xafxefxbcx8cxe6x89xbexe5x88xb0xe6x89x80xe6xb1x82xe3x80x82xe7x99xbexe5xbaxa6xe8xb6x85xe8xbfx87xe5x8dx83xe4xbaxbfxe7x9ax84xe4xb8xadxe6x96x87xe7xbdx91xe9xa1xb5xe6x95xb0xe6x8dxaexe5xbax93xefxbcx8cxe5x8fxafxe4xbbxa5xe7x9exacxe9x97xb4xe6x89xbexe5x88xb0xe7x9bxb8xe5x85xb3xe7x9ax84xe6x90x9cxe7xb4xa2xe7xbbx93xe6x9ex9cxe3x80x82"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="xe7x99xbexe5xbaxa6xe6x90x9cxe7xb4xa2" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu_85beaf5496f291521eb75ba38eacbd87.svg"><link rel="dns-prefetch" href="//dss0.bdstatic.com"/><link rel="dns-prefetch" href="//dss1.bdstatic.com"/><link rel="dns-prefetch" href="//ss1.bdstatic.com"/><link rel="dns-prefetch" href="//sp0.baidu.com"/><link rel="dns-prefetch" href="//sp1.baidu.com"/><link rel="dns-prefetch" href="//sp2.baidu.com"/>

2、网络超时问题

　　在访问一个网页时，如果该网页长时间未响应，系统就会判断该网页超时，而无法打开网页。下面通过代码来模拟一个网络超时的现象。

import requests
# 循环发送请求50次
for a in range(1, 50):
    # 捕获异常
    try:
        # 设置超时为0.5秒
        response = requests.get('https://www.baidu.com/', timeout=0.5)
        # 打印状态码
        print(response.status_code)
    # 捕获异常
    except Exception as e:
        # 打印异常信息
        print('异常'+str(e))

结果：

以上代码中，模拟进行了50次循环请求，设置超时时间为0.5秒，在0.5秒内服务器未作出相应视为超时，程序会将超时信息打印在控制台中。

　　说起网络异常信息，requests模块同样提供了三种常见的网络异常类，示例代码如下：

import requests
# 导入requests.exceptions模块中的三种异常类
from requests.exceptions import ReadTimeout,HTTPError,RequestException
# 循环发送请求50次
for a in range(1, 50):
    # 捕获异常
    try:
        # 设置超时为0.5秒
        response = requests.get('https://www.baidu.com/', timeout=0.5)
        # 打印状态码
        print(response.status_code)
    # 超时异常
    except ReadTimeout:
        print('timeout')
    # HTTP异常
    except HTTPError:
        print('httperror')
    # 请求异常
    except RequestException:
        print('reqerror')

结果：

查看全文

相关阅读:
Codeforces Round #639 Div2 A~D题解
 Codeforces Round #548 Div2 A~C题解
 Codeforces Round #581 Div2 A~D题解
 Educational Codeforces Round 69 Div2 A~D题解
 Codeforces Round #572 Div2 A~E题解
 Codeforces Round #663 Div2 A~D 题解
 44. 通配符匹配 leetcode 每日一题
 174. 地下城游戏 leetcode每日一题
 将有序数组转换为二叉搜索树 2020/7/3
Multiplication 3 AtCoder

原文地址：https://www.cnblogs.com/xiao02fang/p/12927267.html

最新文章
面向对象的五大原则
 设计模块
 异常和处理
 网络的七层模型
 内部类
 过滤器
 scp协议分析
 更加pythonic的写法
 贝叶斯分类算法
 Python闭包