zoukankan      html  css  js  c++  java
  • Python爬虫学习(二)requests库

    一、urllib库

    1、了解urllib

    Urllib是python内置的HTTP请求库

    包括:urllib.request  请求模块

         urllib.error   异常处理模块

              urllib.parse  url解析模块

              urllib.robotparser   robot.txt解析模块

    二、Requests库

    1、简单使用

    import requests
    
    response = requests.get(url)
    
    print(type(response))
    
    print(response.status_code)
    print(response.cookies)
    
    print(response.text)
    
    print(response.content)
    print(response.content.decode("utf-8"))

    注意:

    很多情况下直接用response.text会出现乱码问题,所以常使用response.content,返回二进制格式的数据,在通过decode()转换成utf-8

    也可以使用以下方式进行避免乱码的问题

    response = requests.get(url)
    
    response.encoding = 'utf-8'
    print(response.text)

    2、请求

    • get请求

      (1)基本get请求

      (2)带参数的get请求 

          get?key=val

    response = requests.get("http://httpbin.org/get?name=zhaofan&age=23")
    
    print(response.text)

          通过params关键字传递参数

    data = {
                “name”:"zhaofan" ,
                "age":22
    }
    
    response = requests.get("http://httpbin.org/get",params=data)
    print(response.url)
    print(response.text)
    •    解析json  requests.json执行了json.loads()方法,两者执行的结果一致
    import json
    import requests
    
    response = request.get("http://httpbin.org/get")
    
    print(response.json())
    
    print(json.loads(response.text))
    •   添加headers  有些网站(如知乎)直接通过requests请求访问时,默认是无法访问

    在谷歌浏览器里输入chrome://version,就可以看到用户代理,将用户代理添加到头部信息

    import requests
    headers = {
                     "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
    }    
    
    response = requests.get("https://www.zhihu.com",headers=headers)
    
    print(response.text)
    • post请求

    添加data参数

    import requests
    data = {
              “name”:"zhaofan",
              "age":23
    }
    
    response = requests.post("http://httpbin.org/post",data=data)
    
    print(response.text)
    • 响应

    通过response可以获得很多属性

    import requests
    
    response = requests.get("http://www.baidu.com")
    
    print(response.status_code)
    print(response.headers)
    print(response.cookies)
    print(response.url)
    print(response.history)

    状态码判断

     202:accepted

    404:not_found

  • 相关阅读:
    Nim or not Nim? hdu3032 SG值打表找规律
    Maximum 贪心
    The Super Powers
    LCM Cardinality 暴力
    Longge's problem poj2480 欧拉函数,gcd
    GCD hdu2588
    Perfect Pth Powers poj1730
    6656 Watching the Kangaroo
    yield 小用
    wpf DropDownButton 源码
  • 原文地址:https://www.cnblogs.com/cola-1998/p/12827430.html
Copyright © 2011-2022 走看看