zoukankan html css js c++ java

网络爬虫基础练习

1.利用requests.get(url)获取网页页面的html文件

import requests
 import bs4
url = 'http:://news.gzcc.cn/html/xiaoyuanxinwen/
 response = requests.get(url)
status_code = response.status_code 
content = bs4.BeautifulSoup(response.content.decode("utf-8"), "lxml")
element = content.find_all(id='book')
print(status_code)
 print(element)

　　2.利用BeautifulSoup的HTML解析器，生成结构树

import bs4
 
exampleFile = open('example.html')
exampleSoup = bs4.BeautifulSoup(exampleFile.read(),'html5lib')
elems = exampleSoup.select('#author')
type(elems)
print (elems[0].getText())

　　3.找出特定标签的html元素

import requests 
newsurl='http://news.gzcc.cn/html/xiaoyuanxinwen/'
res=requests.get(newsurl)
res.encoding='utf-8
print(res.text)
from bs4 import BeautifulSoup
html_sample=‘’
soup=BeautifulSoup(html_sample,'html.parser') 
print(soup.text)

　　4.取得含有特定CSS属性的元素

alink = soup.select('#title')  
print alink # [<h1 id="title">Hello World</h1>]  
soup = BeautifulSoup(html_sample)  
for link in soup.select('.link'):  
    print link

5.练习：

取出h1标签的文本

soup = BeautifulSoup(html_sample)  
header = soup.select('h1')  
print(header)#  [<h1 id="title">Hello World</h1>]  
print header[0]# <h1 id="title">Hello World</h1>  
print header[0].text# Hello World

取出a标签的链接

alink = soup.select('a')  
print alink  
# [<a class="link" href="#">This is link1</a>, <a class="link" href="#link2">This is link2</a>]  
for link in alink:  
    print link

取出所有li标签的所有内容

print(soup.li)
print(soup.li.string)
print(type(soup.li.string))
#<li><!--内容--></li>
#<class 'bs4.element.Comment'>

取出一条新闻的标题、链接、发布时间、来源

print(soup.select('div .news-list-title')[0].text)
print(soup.select('div .news-list-thumb')[0].parent.attrs.get('href'))
print(soup.select('div .news-list-info > span')[0].text)
print(soup.select('div .news-list-info > span')[1].text)

查看全文

相关阅读:
Swift DispatchQueue
Function types cannot have argument labels 错误解决方案
 CocoaPods 使用详解
 鸡兔同笼：笼子里一共有鸡和兔子35只，一共有94条退，笼子里一共有鸡和兔子共多少只
 一次酒店宴席安排宾客就座吃饭，5人一桌剩4人， 7人一桌剩6人，9人一桌剩8人，11人一桌正好。问宴席共最少有多少人
 一次酒店宴席安排宾客就座吃饭，5人一桌剩4人， 7人一桌剩6人，9人一桌剩8人，11人一桌正好。问宴席共最少有多少人
 求1到100之间的素数(能被1和他本身整除的数)
求1到100之间的素数(能被1和他本身整除的数)
给你一个整型数组如{1，3，4，7，2，1，1，5，2}， * 打印出现次数最多的那个数，如果最多的次数相同，则打印数字大的那个数。
给你一个整型数组如{1，3，4，7，2，1，1，5，2}， * 打印出现次数最多的那个数，如果最多的次数相同，则打印数字大的那个数。

原文地址：https://www.cnblogs.com/tyx123/p/8668086.html