zoukankan html css js c++ java

python学习03-使用动态ua

在写爬虫的时候要使用到浏览器ua

分享一下今天学到的如何使用动态ua的进行爬取

1.简单的爬取网页信息

from urllib.request import urlopen
#目标地址
url = "https://www.baidu.com"
#请求
respose = urlopen(url)
#读取内容
info = respose.read()
#打印输出
print(info.decode())

2.使用request爬取百度网页信息

from urllib.request import urlopen
from urllib.request import Request
from random import choice
#目标地址
url = "https://www.baidu.com"
#随机获取一个浏览器ua
user_agents= [
    "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)The World 2.x",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
]
headers = {
    "User-Agent":choice(user_agents)
}

#请求
request = Request(url,headers = headers)

response = urlopen(request)
#读取内容
info = response.read()
#打印输出
print(info.decode())

3.使用useragent实现动态ua

from urllib.request import urlopen
from urllib.request import Request
from fake_useragent import UserAgent
#目标地址
url = "https://www.baidu.com"
#随机获取一个动态ua，ua.chrome,ua.firfox都可以
ua = UserAgent()
headers = {
    "User-Agent":ua.chrome
}
#发起请求
request = Request(url,headers = headers)
#urlopen()获取页面，类型是字节，需要用decode()解码，转换成str类型
respose = urlopen(request)
#读取数据
info = respose.read()
#打印输出
print(info.decode())

查看全文

相关阅读:
hdu 4009 Transfer water（最小型树图）
如何使用java调用DLL运行C++(初篇)
腾讯笔试题（2015）
md5算法原理一窥（其一）
hdu 3038 How Many Answers Are Wrong ( 带权并查集 )
Java 基础知识点（必知必会其二）
Java 基础知识点（必知必会其一）
web基础之hibernate(一篇)
web基础之Structs(一篇)
mysql知识初篇（一）

原文地址：https://www.cnblogs.com/ma1998/p/13323459.html