zoukankan html css js c++ java

Python爬虫学习（二）requests库

一、urllib库

1、了解urllib

Urllib是python内置的HTTP请求库

包括：urllib.request 请求模块

　　 urllib.error 异常处理模块

urllib.parse url解析模块

urllib.robotparser robot.txt解析模块

二、Requests库

1、简单使用

import requests

response = requests.get(url)

print(type(response))

print(response.status_code)
print(response.cookies)

print(response.text)

print(response.content)
print(response.content.decode("utf-8"))

注意：

很多情况下直接用response.text会出现乱码问题，所以常使用response.content，返回二进制格式的数据，在通过decode()转换成utf-8

也可以使用以下方式进行避免乱码的问题

response = requests.get(url)

response.encoding = 'utf-8'
print(response.text)

2、请求

get请求

　　（1）基本get请求

　　（2）带参数的get请求

　　　　　 get?key=val

response = requests.get("http://httpbin.org/get?name=zhaofan&age=23")

print(response.text)

　　　　　　通过params关键字传递参数

data = {
            “name”:"zhaofan" ,
            "age":22
}

response = requests.get("http://httpbin.org/get",params=data)
print(response.url)
print(response.text)

　　　解析json requests.json执行了json.loads()方法，两者执行的结果一致

import json
import requests

response = request.get("http://httpbin.org/get")

print(response.json())

print(json.loads(response.text))

　　添加headers 有些网站（如知乎）直接通过requests请求访问时，默认是无法访问

在谷歌浏览器里输入chrome://version，就可以看到用户代理，将用户代理添加到头部信息

import requests
headers = {
                 "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
}    

response = requests.get("https://www.zhihu.com",headers=headers)

print(response.text)

post请求

添加data参数

import requests
data = {
          “name”:"zhaofan",
          "age":23
}

response = requests.post("http://httpbin.org/post",data=data)

print(response.text)

响应

通过response可以获得很多属性

import requests

response = requests.get("http://www.baidu.com")

print(response.status_code)
print(response.headers)
print(response.cookies)
print(response.url)
print(response.history)

状态码判断

202：accepted

404：not_found

查看全文

相关阅读:
Nim or not Nim? hdu3032 SG值打表找规律
 Maximum 贪心
 The Super Powers
LCM Cardinality 暴力
 Longge's problem poj2480 欧拉函数，gcd
GCD hdu2588
Perfect Pth Powers poj1730
6656 Watching the Kangaroo
yield 小用
 wpf DropDownButton 源码

原文地址：https://www.cnblogs.com/cola-1998/p/12827430.html