zoukankan html css js c++ java

（二）requests模块

一 requests模块

概念:
- python中原生的基于网络请求的模块,模拟浏览器进行请求发送,获取页面数据
安装: pip install requests

二 requests使用的步骤

1 指定url
2 基于requests模块请求发送
3 获取响应对象中的数据值(text)
4 持久化储存

三反反爬

1 设置ip
2 设置UA

import requests

word = input('请你输入你要查的词')

url = 'https://www.sogou.com/web?'

params = {
    'query': word
}

heards = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
}

response = requests.get(url=url, params=params,heards=heards,proxies={'https': '62.103.68.8:8080'}) ######UA  和   IP

page_tail = response.text

filename = word + '.html'

with open(filename, 'w', encoding='utf-8') as f:
    f.write(page_tail)

四示例

No.1基于requests模块的get请求

需求1：爬取搜狗首页的页面数据

import requests

# 1 指定url
url = 'https://www.sogou.com/'
# 2 基于ruquests模块发送请求
response = requests.get(url=url)
# 3 获取响应对象的数据值
page_text = response.text
# 4 持久化存储
with open('./sogou.html','w',encoding='utf-8') as f:
    f.write(page_text)

注意: 对于上面的代码

response.content             返回二进制的页面数据
response.headers             返回响应头信息
response.status_code         返回响应200
response.url                 返回是地址
response.encoding            返回的是响应对象中存储数据的原始编码程序

需求2:爬取搜狗指定词搜索后的页面数据

import requests

word = input('请你输入你要查的词')
url = 'https://www.sogou.com/web'


param = {
    'query': word
}
response = requests.get(url=url, params=param)

page_text = response.text
filename = word+'.html'
with open(filename, 'w', encoding='utf-8') as f:
    f.write(page_text)

No.2基于requests模块的post请求

需求3:登录豆瓣电影，爬取登录成功后的页面数据

# 依照我们上面所说的步骤
import requests

url = 'https://www.douban.com/accounts/login'

data = {                             # 在浏览器中找
    "source": "index_nav",
    "form_email": "xxxxxxxxx",
    "form_password": "xxxxxxxxx"
}

response = requests.post(url=url,data=data)

page_text = response.text

with open('douban.html', 'w', encoding='utf-8') as f:
    f.write(page_text)

需求4：

基于requests模块ajax的get请求-------爬取豆瓣电影分类排行榜 https://movie.douban.com/中的电影详情数据

import requests

url = 'https://movie.douban.com/j/chart/top_list?'

param = {                               #携带的数据
    'type': '13',
    'interval_id': '100:90',
    'action': '',
    'start': '20',
    'limit': '20',
}

response = requests.get(url=url, params=param})
print(response.text)

需求5:基于requests模块ajax的post请求-------------------------爬取肯德基餐厅查询http://www.kfc.com.cn/kfccda/index.aspx中指定地点的餐厅数据

import requests

url = ' http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
city = input('请输入你要查的城市')
data = {
    'cname': '',
    'pid': '',
    'keyword': city,
    'pageIndex': '1',
    'pageSize': '10',
}
response = requests.post(url=url, data=data)
print(response.text)

需求6:简单的爬取博客园前几页

import requests
import os

url = 'https://www.cnblogs.com/#p'
if not os.path.exists('boke'):
    os.mkdir('boke')

start_page = int(input('enter a start page:'))
end_page = int(input('enter a end page:'))

for page in range(start_page, end_page + 1):
    url = url + str(page)
    response = requests.get(url=url, proxies={'https': '62.103.68.8:8080'})
    page_text = response.text

    fileName = str(page) + '.html'
    filePath = './boke/' + fileName
    with open(filePath, 'w', encoding='utf-8') as f:
        f.write(page_text)
        print('第%s页打印' % page)

# 根据实际情况   本段代码所保存的html,是同一个(第一页的内容),
# 我们从页面抓包可以知道,它在第二页的时候发送了一个post请求

import requests
import os

url = "http://www.cnblogs.com/mvc/AggSite/PostList.aspx"      # url
if not os.path.exists('boke'):
    os.mkdir('boke')

start_page = int(input('enter a start page:'))
end_page = int(input('enter a end page:'))

for page in range(start_page, end_page+1):
    data = {
        "CategoryType": "SiteHome",
        "ParentCategoryId": 0,
        "CategoryId": 808,
        "PageIndex": page,
        "TotalPostCount": 4000,
        "ItemListActionName": "PostList"
    }

    res = requests.post(url=url, data=data, verify=False)
    page_text = res.text

    fileName = str(page) + '.html'
    filePath = './boke/' + fileName
    with open(filePath, 'w', encoding='gbk') as f:
        f.write(page_text)
        print('第%s页打印' % page)

查看全文

相关阅读:
十一、GUI设计-记事本程序
 十、GUI编程
 OSI七层模型中各层的数据名称
 使用了frame的页面如何整体进行跳转，而不是仅frame跳转
 MySQL脏读、不可重复读、幻读
 博客园后台搜索自己的博客
 完整的ELK+filebeat+kafka笔记
 InnoDB引擎中的索引与算法
 Docker pull下载出现 error pulling image configuration:
多台服务器通过docker搭建ELK集群

原文地址：https://www.cnblogs.com/a438842265/p/9990339.html

（二）requests模块

一 requests模块

二 requests使用的步骤

三 反反爬

四 示例

三反反爬

四示例