Python 爬取页面内容

import urllib.request
import requests
from bs4 import BeautifulSoup

url = "http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2018/12/1201.html"
headers = ("User-Agent","Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
opener = urllib.request.build_opener()
opener.addheaders = [headers]
data = opener.open(url).read()
content = data.decode('GB2312')
soup = BeautifulSoup(content, 'html.parser')
print(soup.find_all('a'))

for link in soup.find_all('a'):
    print('url:',link.attrs['href'])
    print('text:',link.get_text('title'))

查看全文

相关阅读:
Windows Server 2008安装Memcached笔记
 解决powerDesinger12逆向工程报错：Unable to list the tables
冒泡排序算法
 ASP.NET面试题(推荐_有答案)
ASP.NET服务器控件分类简介
 将excel文件中的数据导入导出至SQL数据库中（Microsoft.Jet.OLEDB.4.0和Microsoft.ACE.OLEDB.12.0|office2003和office2007）
关于sql access excel以及在web.config中数据库连接字符串的写法
 ODBC / OLEDB___DAO / RDO / ADO
什么是CSV格式文档
 AppSettings和ConnectionStrings的区别

原文地址：https://www.cnblogs.com/isungge/p/11598112.html