Python网络爬虫 - 走看看

zoukankan html css js c++ java

Python网络爬虫
下面我们创建一个真正的爬虫例子

爬取我的博客园个人主页首页的推荐文章列表和地址

scrape_home_articles.py

from urllib.request import urlopen from bs4 import BeautifulSoup import re html = urlopen("http://www.cnblogs.com/davidgu") bsObj = BeautifulSoup(html, "html.parser") for link in bsObj.find("div", {"id":"main_container"}).findAll("a", href=re.compile("^http://www.cnblogs.com/davidgu/p")): if 'href' in link.attrs and not('class' in link.attrs): print(link.string) print(link.attrs['href']) print("--------------------------------------------------------------")

运行结果：
[置顶]解决adb server端口被占用的问题
http://www.cnblogs.com/davidgu/p/4515236.html
--------------------------------------------------------------
[置顶]解决Eclipse下不自动拷贝apk到模拟器问题( The connection to adb is down, and a sever
http://www.cnblogs.com/davidgu/p/4390661.html
--------------------------------------------------------------
常用的正则表达式一览
http://www.cnblogs.com/davidgu/p/4831357.html
--------------------------------------------------------------
C++ 11 - STL - 函数对象(Function Object) (上)
http://www.cnblogs.com/davidgu/p/4829097.html
--------------------------------------------------------------

...
查看全文

相关阅读:
为什么股票一买就跌，一卖就涨？终于找到答案了！
搜集的一些股票讲师的博客
 一位操盘手的临别赠言
 VMware网络连接桥接、NAt、host-only模式
 我常用的网络测试工具
 linux下性能测试工具netperf使用
 vm10虚拟机安装Mac OS X10.10教程
 ACE_Svc_Handler 通信原理
 mypwd实现
 2019-2020-1 20175307 20175308 20175319 实验五通讯协议设计

原文地址：https://www.cnblogs.com/twodog/p/12135312.html

Copyright © 2011-2022 走看看