zoukankan      html  css  js  c++  java
  • 网络爬虫之requests模块

    python3中用于模拟发起网络请求的模块有两个urllib模块和requests模块,由于requests模块相对于urllib模块来说更加简单便捷高效本文就只介绍requests模块。

    环境安装:

    pip install requests

    GET请求:

      HTTP中最常见的请求之一就是GET请求,下面首先来详细了解一下利用requests模块构建GET请求的方法

    首先构建一个最简单的GET请求,url就是请求链接,该网站会判断如果客户发送的是GET请求的话,它返回相印的请求信息
    import
    request url = 'http://httpbin.org/get’ response = requests.get(url=url) print(response.text) 运行结果如下: { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.25.1", "X-Amzn-Trace-Id": "Root=1-6069d800-43b4f5da49eb42f770c9dc90" }, "origin": "113.118.77.36", "url": "http://httpbin.org/get" }

    对于GET请求如果需要附加额外的信息,只需传入params参数即可

    import requests
    
    url = 'http://httpbin.org/get'
    params = {
        'name':'germey',
        'age':22
    }
    
    response = requests.get(url=url,params=params)
    print(response.text)
    
    结果如下:
    {
      "args": {
        "age": "22", 
        "name": "germey"
      }, 
      "headers": {
        "Accept": "*/*", 
        "Accept-Encoding": "gzip, deflate", 
        "Host": "httpbin.org", 
        "User-Agent": "python-requests/2.25.1", 
        "X-Amzn-Trace-Id": "Root=1-6069d97d-10bea5df63cec72311101582"
      }, 
      "origin": "113.118.77.36", 
      "url": "http://httpbin.org/get?name=germey&age=22"
    }

    如果网页上是json数据就需要调用响应数据的json方法,如果是二进制数据就需要调用content方法。

    response.json()
    response.content()

    POST请求:

      前面了解了最基本的GET请求,另一种比较常见的就是POST请求。通用使用requests实现POST请求同样非常简单。

    import requests
    
    url = 'http://httpbin.org/post’
    data = {
        'name':'germey',
        'age':22
    }
    page_text = requests.post(url=url,data=data)
    print(page_text.text)
    
    结果如下:
    {
      "args": {}, 
      "data": "", 
      "files": {}, 
      "form": {
        "age": "22", 
        "name": "germey"
      }, 
      "headers": {
        "Accept": "*/*", 
        "Accept-Encoding": "gzip, deflate", 
        "Content-Length": "18", 
        "Content-Type": "application/x-www-form-urlencoded", 
        "Host": "httpbin.org", 
        "User-Agent": "python-requests/2.25.1", 
        "X-Amzn-Trace-Id": "Root=1-6069dbb2-20c01c0a048c8d0c239cdf28"
      }, 
      "json": null, 
      "origin": "113.118.77.36", 
      "url": "http://httpbin.org/post"
    }

    通常情况下发起请求需要添加headers参数进行UA伪装,不然网页会拒绝你的请求。

    import requests
    
    header = {
    
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15'
    
    }
    
    url = ‘https://www.baidu.com'
    Response = request.get(url, headers = header)
  • 相关阅读:
    UVALive 5983 MAGRID DP
    2015暑假训练(UVALive 5983
    poj 1426 Find The Multiple (BFS)
    poj 3126 Prime Path (BFS)
    poj 2251 Dungeon Master 3维bfs(水水)
    poj 3278 catch that cow BFS(基础水)
    poj3083 Children of the Candy Corn BFS&&DFS
    BZOJ1878: [SDOI2009]HH的项链 (离线查询+树状数组)
    洛谷P3178 [HAOI2015]树上操作(dfs序+线段树)
    洛谷P3065 [USACO12DEC]第一!First!(Trie树+拓扑排序)
  • 原文地址:https://www.cnblogs.com/Pynu/p/14617518.html
Copyright © 2011-2022 走看看