zoukankan      html  css  js  c++  java
  • Python "HTTP Error 403: Forbidden"

    问题:

    执行下面的语句时

    1 def set_IPlsit():
    2     url = 'https://www.whatismyip.com/'
    3     response = urllib.request.urlopen(url)
    4     html = response.read().decode('utf-8')

    出现以下异常:

    C:Users54353AppDataLocalProgramsPythonPython36python.exe "C:/Users/54353/PycharmProjects/untitled/爬虫/图片 - 某网站.py"
    Traceback (most recent call last):
      File "C:/Users/54353/PycharmProjects/untitled/爬虫/图片 - 某网站.py", line 100, in <module>
        ip = set_IPlsit2()
      File "C:/Users/54353/PycharmProjects/untitled/爬虫/图片 - 某网站.py", line 95, in set_IPlsit2
        response = ure.urlopen(url)
      File "C:Users54353AppDataLocalProgramsPythonPython36liburllib
    equest.py", line 223, in urlopen
        return opener.open(url, data, timeout)
      File "C:Users54353AppDataLocalProgramsPythonPython36liburllib
    equest.py", line 532, in open
        response = meth(req, response)
      File "C:Users54353AppDataLocalProgramsPythonPython36liburllib
    equest.py", line 642, in http_response
        'http', request, response, code, msg, hdrs)
      File "C:Users54353AppDataLocalProgramsPythonPython36liburllib
    equest.py", line 570, in error
        return self._call_chain(*args)
      File "C:Users54353AppDataLocalProgramsPythonPython36liburllib
    equest.py", line 504, in _call_chain
        result = func(*args)
      File "C:Users54353AppDataLocalProgramsPythonPython36liburllib
    equest.py", line 650, in http_error_default
        raise HTTPError(req.full_url, code, msg, hdrs, fp)
    urllib.error.HTTPError: HTTP Error 403: Forbidden
    
    Process finished with exit code 1

    分析:

    出现上面的异常是因为用 urllib.request.urlopen 方式打开一个URL,服务器端只会收到一个单纯的对于该页面访问的请求,但是服务器并不知道发送这个请求使用的浏览器,操作系统,硬件平台等信息,而缺失这些信息的请求往往都是非正常的访问,例如爬虫。

    有些网站为了防止这种非正常的访问,会验证请求信息中的UserAgent,如果UserAgent存在异常或者是不存在,那么这次请求将会被拒绝。

    解决方法:

    在请求中添加UserAgent,代码如下

    1 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}  
    2 req = urllib.request.Request(url=chaper_url, headers=headers)  
    3 urllib.request.urlopen(req).read()  
  • 相关阅读:
    leetcode33. Search in Rotated Sorted Array
    pycharm 设置sublime text3 monokai主题
    django class Meta
    leetcode30, Substring With Concatenation Of All Words
    Sublime text3修改tab键为缩进为四个空格,
    sublime text3 python打开图像的问题
    安装上imesupport输入法依然不跟随的解决办法,
    sublime text3 的插件冲突弃用问题,
    sublime text3 BracketHighlighter括号匹配的设置
    windows 下wget的使用
  • 原文地址:https://www.cnblogs.com/scios/p/8639209.html
Copyright © 2011-2022 走看看