beautifulsoup 基本语法含class属性查找小技巧class_ - 走看看

zoukankan html css js c++ java

beautifulsoup 基本语法含class属性查找小技巧class_

案例一：

#coding=utf-8
import json
import requests
from bs4 import BeautifulSoup
url = 'http://www.itest.info/courses' # 定义被抓取页面的url
soup = BeautifulSoup(requests.get(url).text, 'html.parser')# 获取被抓取页面的html代码（注意这里是用 request框架获取的页面源码），并使用html.parser来实例化BeautifulSoup，属于固定套路
for course in soup.find_all('h4'):# 遍历页面上所有的h4标签
　　print course.text.encode('utf-8')# 打印出h4标签的text字符如: 测试开发--试验班
　　print course # 打印出h4的text字符加标签如:<h4>测试开发--试验班</h4>

案例二:

图例:

url = 'https://www.v2ex.com/'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
for span in soup.find_all('span', class_='item_hot_topic_title'):#查找span标签且样式为class_='item_hot_topic_title'，注意是class_，不是class，因为class是python的关键字，所以后面要加个尾巴，防止冲突
　　print span.find('a').text.encode('utf-8')#获取里面的a标签展示,假如span标签里面有很多a标签，可以 for i in span.find_all('a', href='/t/415664')继续筛选
　　print span.find('a')['href'].encode('utf-8') #获取href属性，在bs4里，我们可以通过[attribute_name]的方式来获取元素的属性

查看全文

相关阅读:
gameunity 3.0 (supersocket + lidgren + unity )
lidgren 介绍和使用（四）------ p2p
lidgren 介绍和使用（三）------ 异步获取信息
 lidgren 介绍和使用（二）------集成unity测试
 lidgren 介绍和使用（一）
环
 单调队列
 树状数组
 Win7 远程 Ubuntu 桌面 mate desktop，并实现中文输入法
 Ubuntu安装sougou输入法

原文地址：https://www.cnblogs.com/kaibindirver/p/9927297.html

Copyright © 2011-2022 走看看